My final plan, beyond mere cleanup and bugfixing, is to do something along the way of simplifying convergence conditions. This won’t be anything fancy (I’m thinking of peep-hole optimisation); my goal is to make mess like the following somewhat less horrific. I hope to have a final pull request ready by sunday night. Then I will finally slow down and relax; work towards merging my branches; perhaps review some other GSOC pull requests.
Quite a summer.
We have the following facts about :
And we have the following facts about :
Notice further that the Slater theorem (or a generalisation) can be used to come up with a nice (single) answer if . We thus assume now that (so there is a problematic branch point) and (so that we need to define the lift of the hypergeometric functions involved), and that a version of the slater theorem applies (else there is no work to do at all). As always, we let be the riemann surface of the logarithm, we understand to be defined thereon (although it is not continuous).
We immediately find that is branched at , thus can be expressed as a sum of hypergeometric functions of argument or . Now given a hypergeometric function (where is the unit disc, the region of convergence of the power series representation), we wish to define a lift which makes all slater-expansions valid for all polar (even though they were initially only valid for ). Indeed I claim that there exists a unique lift satisfying the following list of properties:
Such a function has, in particular, the following properties:
From this it is easy to prove that if in all slater expansions the hypergeometric functions have argument , then with the above lift the slater expansion holds for all polar .
My idea is to create new objects to represent above-defined polar lifts, a la “MySqrt(z)”. hyperexpand() will work exclusively with these, except that in the end it (optionally) calls “rewrite(‘elementary’)”. The object then attempts to determine a “nice” representation (where what that means is up to the object, of course). I now have to sit down and compute said representations, and come up with a good way to test them. Sounds like a final project.
Before I go into details, let me stake out a final list of things I hope to do before the end of summer:
Let’s now turn to what I did. First Matthew reported some bugs in my integration code, mostly related to how real variables are converted into polar ones (that is, they were not, which is wrong). I took the opportunity to test lots of integrals that seemed to be important for statistics, e.g. these, computing various moments of the normal, exponential and chi-squared distributions.
Then I started to optimise my code. First I created a script (sympy/benchmark/bench_meijerint.py) which basically times all the individual integrals from the tests. This allowed me to quickly isolate biggest performance problems. This is the output of the first run, and this is the output with my new optimisations. Let me go through the first few entries on both lists.
LT((t-apos)**bpos*exp(-cpos*(t-apos))*Heaviside(t-apos), t, s)
This was my biggest worry, for two reasons. Firstly it seems fairly common—this is just a very basic laplace transform (though with many parameters). Second it is surprisingly hard—in order to do it, the algorithm has to expand the exponential, then bracket the two exponentials as G1, and the rest as G2. However, it turns out that a few tweaks allow us to recognise fairly quickly that the integral won’t be doable without expansion (because is not in the tables), and then a few improvements to the heuristics make us try things in a reasonable order. Thus this laplace transform is comfortably down in the second list; from 4.8 to 0.5 seconds.
MT(bessely(a, sqrt(x))**2, x, s)
and similar mellin transforms involving bessel functions or logarithms are slow for three reasons: (1) Hyperexpand(), is fairly slow to realise that a certain expansion cannot be done. This means that any “mis-step” the algorithm does is quite costly. (2) there tend to be confluent g-functions involved, for which we need series expansions (for computing residues), and these are slow. (3) Combsimp used to be slow. I improved combsimp, and I rewrote the residue function to use series expansions a bit more cleverly. I also improved hyperexpand a bit; the results are fairly noticable, but not as satisfying as for the laplace transform.
E((x+y-1)**2)
This computes a certain statistical double integral in two ways, and it is not actually that slow. By my calculation the code winds up doing about 40 integrals (splitting integrals along the real line into two patrs, expanding the binomial, doing a double integral in two steps, and finally doing it all over again the other way round), putting the time for every single integral well into a sensible range.
I should mention that during this optimisations the path taken by meijerint was changed slightly, causing a test failure in test_meijerint which is actually fairly difficult to correct. So much for robust code…
Finally I implemented exponential integrals. This was quite good fun; let’s just look at a few examples. A few things can be said here. The answers are sometimes messy; often expand() already yields a lot of cancellation. The noticable exception is the integral representation of the generalised exponential integral with two parameters, here more tricks are necessary (but I don’t really think there is any way to know in advance that expressing everything in terms of expint yields such a nice answer, short from recognising the integral). At one stage I’m using a polar variable, this is to allow to expand (alternatively one could safely use force here). The laplace transforms are a nice example where answers come out much messier than they have to (the other exponential integrals are mostly as bad es Si, noticable exception being e1). Some of the definite integrals illustrate a similar problem.
Anyway. First I added lerch phi to hyperexpand. Here are a few examples. There is not a lot to be said here, but it is good to see many common sums to work now.
Next I improved hyperexpand() to handle some expansions at “special points”. This means evaluating hypergeometric (or meijer g) functions at say , even if we don’t know any closed-form expressions for general . There is a vast literature on such “hypergeometric identities”, and my code is a very humble start at best. It basically just implements Gauss’ and Kummer’s summation theorems for 2F1 and nothing else, but this is fairly effective. Then I improved one of the convergence conditions—it turns out that in addition to what is listed on the wolfram functions site, in the russian book from which they took the conditions there is a crucial extra part. After finding someone to translate it to me I could implement this; now we can do some more integrals. The upshot of this is that the mellin transform of a product of bessel functions can now be derived by the code, instead of having to put it into the table.
Let me put this into perspective. There are (at least) the functions . The general products can be exrpessed as g-functions, and similarly for . Also many special products can be expressed. I had previously put in a few of the identities of the first kind. Now almost all of them can be derived from just the entries for single functions. (There are some problems with functions of the second kind, which tend to be singular and/or rapidly growing, so that they don’t really have mellin transforms; deriving formulae for these is difficult in the current setup). On the other hand, this can be somewhat slow; for this reason I only commented out the formulae instead of removing them. Here are a few timings. These are evidently not great, I’ll have to see what can be done. My guess is that hyperexpand() is relatively slow, but I haven’t looked into this further. [Note also that running this with cache off is much slower, since the algorithm internally uses caching.]
Finally, I improved the integration heuristics so as to be able to do some more integrals [with a few additional factors the last integral is a representation of besselj]. Again I don’t know what makes this so slow.
In closing, let’s look at some more fun definite integrals (all played around with long enough until I found a variation that can be done in closed form ). Again not much to say here (except that (9) shows a bug in the hyperepand table, the minus sign must be ); the numerical computations are for comparison with wolfram alpha.
Then I started working on adding lerchphi and polylogarithms to sympy. The goal of this is to incorporate them into hyperexpand(), so that a few more interesting series can be summed. This is good fun. Here are a few exampes. As you can see, all the standard things that one expects to work do work. And expand_func() can be used to reduce lerchphi to polylogarithms. In fact it can also reduce to hurwitz zeta functions in some cases but that is a mess. However, it is correct (tested numerically) and in a specific sense even simpler. In any case it’s nice to have it, even if it is not used much.
I also started extending hyperexpand to recognise lerch phi … this is slightly non-trivial (compared to normal table extensions) because lerch phi is actually not hypergeometric unless the parameter s is an integer, and even then the number of parameters of the hypergeometric function depends on s. Thus we need a special function to recognise such hypergeometric functions and generate formulae on the fly, this is what I am working on now. It will be finished on monday (so I have something cool to show off next week again *g*).
Finally it turns out that there are some subtle bugs in random numeric testing for hyperexpand. Since the code is in master now and we are about to release this is fairly bad of course. But luckily another gsoc student is setting up jenkins and configured it in such a way to currently run the relevant tests every five minutes. That should allow me to weed out all bugs.
Thus in order to understand what is going on, we first have to understand the extension of both functions to true polar numbers , because this is what the meijer g code works with. Since inverse sine is surely nasty, let’s try to understand the square root expression. A clue comes from enabling debug output: the meijerint code decides . From the definitions, for we find , where , which we shall define only for , is , evaluated on the principal branch (indeed it is just a binomial series). Also from the definitions, we find that for , , where denotes the holomorphic function (which is, in particular, continuous, i.e. free of branch cuts). Finally we know that for , these two definitions must patch together continuously.
Note how the branching has been resolved crudely: for every polar number x (outside a set of measure zero which does not disconnect ) the integrand has acquired a definite value, continuous on circles. However, there is a circular branch cut on every other sheet.
Now let’s look back at the integrand. The meijerint code interprets as . [It’s and not since “in the wild” means in the standard branch. Of course one can specify this by hand if it is not the desired choice.] Hence for , the integrand is (Recall that in the first term the the square root means the continuous function on , and so the second term is the same as the first. In the third term the square root denotes the principal branch, which is continuous throughout as well.)
Finally, for , the derivative of is , again with the principal branch of the square root (for the same reason as before: continue analytically on circles from the real-valued function). Thus lo and behold, if we differentiate for , we do get out the right sign. As a side note, if we replace in the integrand by , we get the other sign.
So this week I finally extended the integration lookup tables to now include trigonomic, hyperbolic and exponential functions, certain algebraic functions, error functions, the logarithm, the error function, and various bessel functions. I also did some miscellaneous fixes and extensions which I might mention in passing. Furthermore I decided to create a “TODO later” list: basically a list of things that I really really do want to fix, but that will not go into my first integrals pull request. For I have been working on this for so (comparatively) long, I feel I need to stabilise the branch. Indeed the only thing left to do before I subsmit it is a decent treatment of branched functions. After that I think progress can be incremential, and I will start working on the “TODO later” stuff in a gsoc-3 branch.
This post consists of three parts: the first shows results of this weeks work. The second describes some items on the “TODO later” list (to give an idea of what will be missing in the upcoming pull request). The third branch describes my ideas for polar numbers in detail, which I will start implementing tomorrow (I did not actually do any work today, instead I want to the Calypso Water park (near Ottawa). Hurray for flexible work hours.)
Basically now the code can solve the same sort of problems as before, just on many more classes of functions. One simple example is some indefinite integration. A few comments are in order. First of all, there is no use hiding that indefinite integration is hardly the strength of the algorithm. The examples show that it can be made to work in some situations, but basically any composition of functions or similar will stop it from working. (The same is true for definite integration, of course, but there nobody expects much more *g*.) We see in equations 4, 5, 7, 9 (sorry for the non-consecutive numbering) that integration of simple combinations of trigonometric and polynomial functions can be done, and can indeed be faster than existing code. Equations 12, 14, 15 show recursive mellin transforms in action, of course this gets much slower. Equations 18, 19, 20 and 38 show integration of bessel functions. Incidentially I could not verify the last by differentiation, so I will have to look into this again—before submitting the pull request, of course. Equations 40 and 41 illustrate integration of bessel functions. Equation 42 shows a slight shortcoming: the integral can be expressed using struve functions (I think), but they are just not in the table for expanding g-functions (and neither in sympy). Equation 43 shows a more serious limitation: the integral can be done using integration by parts, but the integrand is *not* a meijer g-function (mellin transform involves digamma functions), so this is not an integral the algorithm will be able to determine.
Let’s now look at some definite integration. There is not very much to say here, except that I show these integrals because they can be done. There are many more like them that can be done, and extremely many more like them that cannot be done. Some of these might look odd (23, 36); I will explain what is going on in the “TODO later” section.
One problem is that hyperexpand() sometimes is not very clever. For example in equation 28 of the definite integration examples, we see that apparently the integral cannot be done in named special functions. hyperexpand() retains the meijer g-function because the hypergeometric functions cannot be expanded; this is usually a good idea since a g-function expansion in hypergeometric functions can be very messy. However in this case the hypergeometric functions actually cancel! hyperexpand() note this and return the simplified result. However things are not quite so easy. Sometimes there is choice involved in what hypergeometric functions to expand into, and only some choices cancel. Moreover, sometimes the hypergeometric functions do not cancel, but instead can be evaluated at special arguments. Here are a few examples. Equation 3 shows some non-obvious cancellation: the gammas have to be simplified, and then the symmetry of the arguments of the hypergeometric functions has to be used. Equation five shows an example that can be done because of the special argument, indeed 2F1 can always be summed at unity, using a theorem of Gauss’. Equation 7 shows something more complicated still, I know it can be done but I am not sure how (maybe using a so-called quadratic transformation). One side-effect of this shortcoming is that currently I have had to put formulae for e.g. into the table, even though they should be derivable (at a time cost, of course). This is a bad thing since it leads to an explosion of entries.
Another thing that should be fixed is that the convergence conditions for the integral of two g-functions are sometimes just too weak. For example none of the following can currently be done: , , , . This also stops the recursive algorithm from working.
Then the tables can and should of course be extended further. Notably missing currently are inverse trigonometric functions and powers of trigonometric/hyperbolic functions (though these should be derivable).
As another point, the code to transform an integrand and limits into a known form is not currently very clever. For example can be done by rewriting it as , but the code cannot do this, yet.
On a related note, I should probably write a function to simplify bessel functions, e.g. equation 36 of the definite integration examples is just a single bessel I function (times a step function), but this is far from obvious. Importantly this needs proper treatment of polar numbers to get the branch factors right.
Finally, my new plan to treat branch cuts. I think there is no way around implementing proper polar numbers, in some way. Let’s first talk about the mathematics, and then about a possible implementation.
Let again denote the riemann surface of the logarithm, and let denote the ordinary complex line. [We shall sometimes think of as an open subset of a topological space which contains one additional point called “origin”; notably there is no complex structure near it.] We have the holomorphic map which is indeed a biholomorphism. Its inverse is . We also have a holomorphic projection map , where denotes the ordinary exponential function from to itself. There is a section (one-sided inverse) of related to the standard branch of the argument which we shall denote , where is real and . This function is not continuous at the non-positive reals, but holomorphic outside thereof.
Multiplication in is defined in the obvious way. If is a meromorphic (say) function, then is also meromorphic. Similarly if is any meromorphic (say) function, then so is . This allows us to extend many more functions from to , and this is often done implicitely. Let’s mention two peculiarites: the sum of two elements in cannot be defined very satisfactorily (although of course their sum in is well-defined and well-behaved). Also if we can define , and if instead we can define . With these definitions all the usual algebraic relations (like ) hold for all .
Now about the implementation. First we need a function to create/represent elements of . I had thought about exp(x, polar=True), polar being a new assumption. While I still think introducing an assumption is a good thing, I’m not so sure about this notation any more. The advantage is that the object looks very much like exp, so much code (printing, simplification) knows what to do with it. The disadvantage is that the assumption is easily lost, e.g. obj.func(*obj.args) will lose it, and also code might just create new exp objects, not knowing that there can be assumptions around. So probably creating a new function exp_polar is better.
Next I want to add an assumption polar which can be given to symbols. It does not interact much with other assumptions, except that positive shall imply polar. Much simplification code (notably in expand_log, powdenest, powsimp, logcombine) can now be updated to look for the polar assumption instead of the cruder positive.
Finally, for sake of usability, I propose the following functions: polarify(expr, subs=True/False) traverses the expression tree and turns any non-positive numeric constants (e.g. 1+I) into their image under (namely sqrt(2)*exp_polar(I*pi/4)). It also converts instances of exp into exp_polar. If subs=True, also all symbols that are not polar will be substituted for polar dummies; thus polarify() behaves much like posify(). A function unpolarify(expr) that essentially computes the image of expr under , in as simple a form as possible. Thus it converts the outermost instances of exp_polar into exp, and recursively does the same thing for arguments of unbranched functions (e.g. powers with integral exponents, or things like sin(x)—here probably the function class should give a hint, similar to nargs). Then functions like expand, powdenest can accept a new argument polar=True/False which works like force, except that it calls polarify instead of posify. Finally for convenience Expr should get a method extract_branch_factor (a wrapper around extract_multiplicatively I imagine) that extracts factors of the form exp_polar(I*pi*n), n nt integer, which allows branched functions to expose their branching behaviour.
This is about it; I feel I have spent forever writing this post (it’s actually just 2.5 hours :D).
As before, the aim of this post is two-fold: both to clarify my thoughts, and to invite suggestions for solution.
It turn out my previous treatment of branched functions was a bit short-sighted (who would have guessed that, hu? :D). In order to make progress, I have for now been ignoring problems related to branches/arguments; but by now this is the last big problem before my branch can finally become “pull request ready” (more on that in friday’s post). Let me first describe the problem(s) at hand. I have picked five (six) representative examples.
I think the following conclusions can be drawn from these examples:
I’m looking for an unintrusive way to solve the above four problems.
I’m toying with the idea of adding a new assumption “polar” to sympy. I think almost no code needs to know about it, and it seems like it could solve my problems. I’m not very experienced with the sympy core, though, so I may be wrong. I imagine that this assumption can be set on various objects to make them behave like numbers on the riemann surface of the logarithm. In particular exp(z, polar=True) stops exp(2*pi*I) == 1. We could also allow this on e.g. I or -1, so that exp(pi*I, polar=True) can return -1 with the polar=True flag set. Multiplying two such factors together will propagate back to exp. Also polar=True should stop exp from being expanded trigonometrically (since adding polar numbers does not really make sense, as far as I can see—adding shouldn’t be disallowed though, since it can be useful formal notation). Matching code could propagate this flag (I’m fuzzy on the details). Similarly powdenest etc could watch out for it.
This solves all but (3), for which I could just make a new function. My existing machinery for handling branch cuts should be adapted, obviously (actually most of it could go now). argument_dummy can go of course.
It might even be advantageous to have an assumption “period” which can have a numerical value (where the old behaviour means period=None, and the above behaviour means period=oo); but I think assumptions must be flags so this cannot work. If it would, it might allow for a unified treatment of (1-4), but that’s probably not worth it.
In any case, this week I fixed the combsimp algorithm for gamma functions, I implemented most of the expansion algorithm for G functions, I added classes to represent unevaluated transforms (as suggested in a comment to my last blog post), and I just about started to to extend the lookup tables. The last part turns out to be a real stress test for my code (since obviously we want to derive the formulas wherever possible, instead of putting them all into tables).
As I said before, there is not a lot to show off. Perhaps one word of clarification: there was code to expand meijer g-functions all the time. It works by evaluating the defining integral(s) using the residue theorem, i.e. using the (generalised) slater theorem. However, in some cases this does not work, because the series are properly non-hypergeometric. This can be quite annoying. Consider, for example . This formula holds for all that are not positive integers (that is for all so that the G-function on the left hand side makes sense). But for negative integers, slater’s theorem does not apply! This is quite a shame, since obviously substituting them in the end result is no problem at all, there does not seem to be a limit involved. But this is not the case, there is some gamma function cancellation going on. In fact, whenever Slater’s theorem applies, we know that the G-function is a sum of terms of the form , where is unbranched at the origin. But has logarithmic monodromy at the origin. Thus there really is a limiting process.
Hence I spent about 1.5 days developing this new code. And then another half a day thinking about the branch cuts. Let’s see how much longer it is going to take until there is a satisfactory solution ^^.
The problem we will study is how to compute the inverse fourier transform . First of all it is easy to see that the integral converges only for real, non-zero . Also using contour integration it is easily established that , where denotes the Heaviside step function.
How does the Meijer G-function algorithm compute this integral? First of all it is split up into two parts, and . Let’s concentrate on for now. The integral exists if the exponential has a decaying component, or if it is oscillatory. That is to say it converges for . But let’s be a bit more specific: all meijer G functions are defined on , and may or may not descend to . Thus define to be our integral (where denotes the region of convergence). Of course when the integral converges (note that this equation is not vacuous: on ). We thus find:
The integral converges if , for some integer .
Having established this, let’s see what the algorithm computes. I should explain the output. Note that debugging output is enabled. The first piecewise result is what would ordinarily be presented to the user: there is a closed expression, holding under various conditions. And there is the “otherwise” part of the piecewise, which just says that if the conditions do not hold, we don’t know the answer. Consider the conditions. unbranched_argument(t) means the argument of (whereas arg(t) would mean some value between and , and that function is not continuous either). Thus the conditions are just a funny way of saying . Now consider the claimed result. It looks a bit daunting, but this is because the functions in which it is written are defined on , not . The debugging output line “Result before branch substitutions …” shows what the result would be on . I have computed this in output [8] (notice that what looks like here is a dummy, to stop evaluation of etc). Of course is an unbranched function on , so the first two factors are uninteresting. But the last factor is not: we see that the result is related to the upper incomplete gamma function. This is a branched function, i.e. it is properly defined on and it does not descend to . If is a positive real number, then the incomplete gamma function is to be evaluated at argument . This is where the extra terms (like argument_period()) come from: on argument means negative reals. But with the branch cut conventions as implemented (in mpmath, effectively, but also enforced throughout sympy), negative reals mean argument . So the answer cannot just be , because this would end up on the wrong branch. Thus we conclude (or rather the algorithm did for us):
Define by . Then for .
Notice in particular that our original integral has two natural extensions to : the naive definition , and the analytic continuation . The algorithm has to return some subset of the first definition, and not the second. In a sense this is why the convergence condition is necessary (although technically it is there because the integral representation of the G-functions used to establish this formula diverges outside the region specified by the convergence conditions).
We could now do a similar analysis for . The story is essentially the same, with the upper half plane replaced by the lower half plane () throughout. So when can we compute the combined integral? We already noted that this should be the case exactly when is real, and working in this is easy to see: the intersection of the (closed) upper and lower half planes is the real axis. But notice how this fails on The intersection of (where we can evaluate the first integral) and is , i.e. only positive .
To further amplify the confusion, computing said fourier transform with argument , where is declared positive, works. This is because before “switching” to working on , the algorithm carelessly cancels some minus signs … which is fine for entire functions, of course.
Here ends this rather lengthy explanation. It remains to figure out how we can teach the algorithm that the integrals are periodic in argument , instead of knowing them only on one half-sheet. I have some vague ideas for this, but input is appreciated. In any case, I’ll think about this for a while. For now there is enough other work to do.