Last Status Update (Probably)

In Uncategorized on August 12, 2011 by nessgrh

Time flies by … I feel like I just started hyperexpand() yesterday (not quite actually; but you get the idea *g*). I’ve been moving around this week, so I wasn’t terribly productive so far. I did, however, compute (reasonably) neat representations for all the {}_pF_{p-1} in the hyperexpand() table. These examples demonstrate the kind of neat answers this can yield (compare to some of the examples from last week) [actually the tables contain some much more complicated functions, but I was not yet able to come up with interesting integrals that yield these answers *g*].

My final plan, beyond mere cleanup and bugfixing, is to do something along the way of simplifying convergence conditions. This won’t be anything fancy (I’m thinking of peep-hole optimisation); my goal is to make mess like the following somewhat less horrific. I hope to have a final pull request ready by sunday night. Then I will finally slow down and relax; work towards merging my branches; perhaps review some other GSOC pull requests.

Quite a summer.


Branching, Once More

In Uncategorized on August 6, 2011 by nessgrh

As usual with my “out-of-line” posts, this one is again mostly to clear my own thoughts. It seeks to address the problem of how to get out “nice” answers, in the sense of e.g. finding a representation that is manifestly real (say to use inverse trig functions instead of logarithms with imaginary arguments). This turns out to be fairly delicate; let me explain why. Recall that when the code returns an expression like \sqrt{1 + e^{-i \pi} z}, we really mean a certain function on the riemann surface of the logarithm which we try to crudely express above. First of all, it is clear that if we want neat answers, we cannot just return this expression as stated. This is because the square-root function per se has lots of meanings, whereas we attach here a very specific meaning to it. There is no realiable way to recognise and transform such expressions later. So we really should return a new function object “MySqrt(z)” which has well-defined meaning for all polar z, and which can convert itself into a nice representation. The trouble is now that we have to very carefully define MySqrt(z) for all polar z, in such a way that all slater-expansions are correct. It is not (to me) immediately obvious that this is even possible (although it will turn out it is); note in particular that this function will necessarily be discontinuous. The first problem to solve then is how to deal with hypergeometric and meijer g-functions branched at other points than zero and infinity. After that we need to find a way to display things nicely.

Branching at +-1

We have the following facts about {}_pF_q(z):

  • unbranched if p \le q
  • branched at +1 if p = q + 1

And we have the following facts about G_{p,q}^{m,n}(z):

  • usually branched at 0, \infty
  • unbranched at +1 if \delta = p + q - (m + n)/2 > 0; in this case it is even analytic in a neighborhood of the positive reals
  • if p = q, branched at (-1)^{p - m - n} = (-1)^{q - m - n}
  • always has a polar lift that is continuous on “circles” (spirals) of radius not equal to one

Notice further that the Slater theorem (or a generalisation) can be used to come up with a nice (single) answer if \delta > 0. We thus assume now that p = q (so there is a problematic branch point) and \delta > 0 (so that we need to define the lift of the hypergeometric functions involved), and that a version of the slater theorem applies (else there is no work to do at all). As always, we let \mathcal{S} be the riemann surface of the logarithm, we understand G to be defined thereon (although it is not continuous).

We immediately find that G is branched at -1, thus G can be expressed as a sum of hypergeometric functions of argument -z or \frac{-1}{z}. Now given a hypergeometric function {}_pF_{p-1}: D \rightarrow \mathbb{C} (where D is the unit disc, the region of convergence of the power series representation), we wish to define a lift {}_pF_{p-1}: \mathcal{S} \rightarrow \mathbb{C} which makes all slater-expansions valid for all polar z (even though they were initially only valid for |z| < 1). Indeed I claim that there exists a unique f(z) lift satisfying the following list of properties:

  • f is defined on \mathcal{S} \setminus \{e^{i t} | t \in \mathbb{R}\} \cup \{e^{i t}, t \in (0, 2\pi)\}
  • if {}_pF_q does not have a pole at 1, then f is additionally defined on e^0, e^{2 \pi i}
  • f is locally analytic (that is, if f is defined at z_0 and U is an open neighbourhood of z_0 on which f is defined, then f is analytic on U)
  • f is a lift, i.e. if p:\mathcal{S} \rightarrow \mathbb{C} denotes the usual projection, then for all z \in p^{-1}(D) we have f(z) = {}_pF_q(p(z)).

Such a function has, in particular, the following properties:

  • f is continuous along the ray r e^{i\pi}, r > 0
  • it is continuous on “circles” (spirals)

From this it is easy to prove that if in all slater expansions the hypergeometric functions have argument e^{i \pi}z, then with the above lift the slater expansion holds for all polar z.

Finding nice representations

My idea is to create new objects to represent above-defined polar lifts, a la “MySqrt(z)”. hyperexpand() will work exclusively with these, except that in the end it (optionally) calls “rewrite(‘elementary’)”. The object then attempts to determine a “nice” representation (where what that means is up to the object, of course). I now have to sit down and compute said representations, and come up with a good way to test them. Sounds like a final project.


Status Update—Week 11

In Uncategorized on August 5, 2011 by nessgrh

The end of GSOC is rapidly approaching, so this was one further busy week. Here is what I did:

  • Fix some bugs in meijerint, uncovered by statistical integrals,
  • Optimise meijerint and hyperexpand,
  • Implement exponential integrals.

Before I go into details, let me stake out a final list of things I hope to do before the end of summer:

  1. Fix a test failure in meijerint. (explained below)
  2. Think about branching again. In particular, investigate how we can come up with “nice” answers.
  3. Investigate why hyperexpand results are sometimes ridiculously complicated.

Let’s now turn to what I did. First Matthew reported some bugs in my integration code, mostly related to how real variables are converted into polar ones (that is, they were not, which is wrong). I took the opportunity to test lots of integrals that seemed to be important for statistics, e.g. these, computing various moments of the normal, exponential and chi-squared distributions.

Then I started to optimise my code. First I created a script (sympy/benchmark/ which basically times all the individual integrals from the tests. This allowed me to quickly isolate biggest performance problems. This is the output of the first run, and this is the output with my new optimisations. Let me go through the first few entries on both lists.

  • LT((t-apos)**bpos*exp(-cpos*(t-apos))*Heaviside(t-apos), t, s)

    This was my biggest worry, for two reasons. Firstly it seems fairly common—this is just a very basic laplace transform (though with many parameters). Second it is surprisingly hard—in order to do it, the algorithm has to expand the exponential, then bracket the two exponentials as G1, and the rest as G2. However, it turns out that a few tweaks allow us to recognise fairly quickly that the integral won’t be doable without expansion (because e^{c*t + d} is not in the tables), and then a few improvements to the heuristics make us try things in a reasonable order. Thus this laplace transform is comfortably down in the second list; from 4.8 to 0.5 seconds.

  • MT(bessely(a, sqrt(x))**2, x, s)

    and similar mellin transforms involving bessel functions or logarithms are slow for three reasons: (1) Hyperexpand(), is fairly slow to realise that a certain expansion cannot be done. This means that any “mis-step” the algorithm does is quite costly. (2) there tend to be confluent g-functions involved, for which we need series expansions (for computing residues), and these are slow. (3) Combsimp used to be slow. I improved combsimp, and I rewrote the residue function to use series expansions a bit more cleverly. I also improved hyperexpand a bit; the results are fairly noticable, but not as satisfying as for the laplace transform.

  • E((x+y-1)**2)

    This computes a certain statistical double integral in two ways, and it is not actually that slow. By my calculation the code winds up doing about 40 integrals (splitting integrals along the real line into two patrs, expanding the binomial, doing a double integral in two steps, and finally doing it all over again the other way round), putting the time for every single integral well into a sensible range.

I should mention that during this optimisations the path taken by meijerint was changed slightly, causing a test failure in test_meijerint which is actually fairly difficult to correct. So much for robust code…

Finally I implemented exponential integrals. This was quite good fun; let’s just look at a few examples. A few things can be said here. The answers are sometimes messy; often expand() already yields a lot of cancellation. The noticable exception is the integral representation of the generalised exponential integral with two parameters, here more tricks are necessary (but I don’t really think there is any way to know in advance that expressing everything in terms of expint yields such a nice answer, short from recognising the integral). At one stage I’m using a polar variable, this is to allow \log(u^2) to expand (alternatively one could safely use force here). The laplace transforms are a nice example where answers come out much messier than they have to (the other exponential integrals are mostly as bad es Si, noticable exception being e1). Some of the definite integrals illustrate a similar problem.


Status Update—Week 10

In Uncategorized on July 28, 2011 by nessgrh

This is a little earlier than my usual status update, but I think now is a good time. I just commited some new code, and many new interesting things work now, hence it is a good time to show off :-). On the other hand there a quite a number of bugs in my code, so that’s probably what I will be spending next week on…

Anyway. First I added lerch phi to hyperexpand. Here are a few examples. There is not a lot to be said here, but it is good to see many common sums to work now.

Next I improved hyperexpand() to handle some expansions at “special points”. This means evaluating hypergeometric (or meijer g) functions at say z=1, even if we don’t know any closed-form expressions for general z. There is  a vast literature on such “hypergeometric identities”, and my code is a very humble start at best. It basically just implements Gauss’ and Kummer’s summation theorems for 2F1 and nothing else, but this is fairly effective. Then I improved one of the convergence conditions—it turns out that in addition to what is listed on the wolfram functions site, in the russian book from which they took the conditions there is a crucial extra part. After finding someone to translate it to me I could implement this; now we can do some more integrals. The upshot of this is that the mellin transform of a product of bessel functions can now be derived by the code, instead of having to put it into the table.

Let me put this into perspective. There are (at least) the functions J, Y, I, K. The general products J_a J_b, Y_a Y_b, J_a Y_b can be exrpessed as g-functions, and similarly for I, K. Also many special products can be expressed. I had previously put in a few of the identities of the first kind. Now almost all of them can be derived from just the entries for single functions. (There are some problems with functions of the second kind, which tend to be singular and/or rapidly growing, so that they don’t really have mellin transforms; deriving formulae for these is difficult in the current setup). On the other hand, this can be somewhat slow; for this reason I only commented out the formulae instead of removing them. Here are a few timings. These are evidently not great, I’ll have to see what can be done. My guess is that hyperexpand() is relatively slow, but I haven’t looked into this further. [Note also that running this with cache off is much slower, since the algorithm internally uses caching.]

Finally, I improved the integration heuristics so as to be able to do some more integrals [with a few additional factors the last integral is a representation of besselj]. Again I don’t know what makes this so slow.

In closing, let’s look at some more fun definite integrals (all played around with long enough until I found a variation that can be done in closed form 😉 ). Again not much to say here (except that (9) shows a bug in the hyperepand table, the minus sign must be e^{-i\pi}); the numerical computations are for comparison with wolfram alpha.


Status Update—Week 9

In Uncategorized on July 22, 2011 by nessgrh

This was one busy week again. I implemented polar numbers and changed over the integration and hyperexpand code to use them. This was more painful than I thought, but it seems to work now. Indeed the problems I mentioned before (other than matching) are gone now, as are the hacks. After this I cleaned up my branch to such an extent that I considered it ready for review. There are still some minor issues regarding numerical evaluation that I’m working to sort out, but this shouldn’t affect the review much.

Then I started working on adding lerchphi and polylogarithms to sympy. The goal of this is to incorporate them into hyperexpand(), so that a few more interesting series can be summed. This is good fun. Here are a few exampes. As you can see, all the standard things that one expects to work do work. And expand_func() can be used to reduce lerchphi to polylogarithms. In fact it can also reduce to hurwitz zeta functions in some cases but that is a mess. However, it is correct (tested numerically) and in a specific sense even simpler. In any case it’s nice to have it, even if it is not used much :-).

I also started extending hyperexpand to recognise lerch phi … this is slightly non-trivial (compared to normal table extensions) because lerch phi is actually not hypergeometric unless the parameter s is an integer, and even then the number of parameters of the hypergeometric function depends on s. Thus we need a special function to recognise such hypergeometric functions and generate formulae on the fly, this is what I am working on now. It will be finished on monday (so I have something cool to show off next week again *g*).

Finally it turns out that there are some subtle bugs in random numeric testing for hyperexpand. Since the code is in master now and we are about to release this is fairly bad of course. But luckily another gsoc student is setting up jenkins and configured it in such a way to currently run the relevant tests every five minutes. That should allow me to weed out all bugs.


Deciphering Branch Behaviour

In Uncategorized on July 22, 2011 by nessgrh

Handling branches in computer systems is indeed unbelievably subtle. Here is one slightly complicated example which comes out correctly, at least when looked at in the right way; this is encouraging. The following integral comes up in an ODE test: \int \frac{\mathrm{d}x}{x \sqrt{1 - x^2}} = i \arcsin{\frac{1}{x}}. The right hand side is as computed by the meijerint code and looks suspicious, not being invariant under complex conjugation. However, note that on both sides are branched functions: \sqrt{1 - x^2} and \arcsin{\frac{1}{x}}. The left hand side is unbranched at the origin, whereas the right hand side is unbranched at infinity. The left hand side is branched at infinity, whereas the rigth hand side is not. (There is nothing unusual about this, there is no reason to expect indefinite integrals to retain branchpoints.)

Thus in order to understand what is going on, we first have to understand the extension of both functions to true polar numbers x, because this is what the meijer g code works with. Since inverse sine is surely nasty, let’s try to understand the square root expression. A clue comes from enabling debug output: the meijerint code decides \frac{1}{\sqrt{1-x^2}} = G(x^2 e^{i \pi}) = \sqrt{\pi} G_{1, 1}^{1, 1} \left(\begin{matrix}\frac{1}{2} \\ 0 \end{matrix}\middle| x^2 e^{i\pi} \right). From the definitions, for |x| < 1 we find G(x) = F(x), where F(x), which we shall define only for |x| < 1, is (1 + x)^{-\frac{1}{2}}, evaluated on the principal branch (indeed it is just a binomial series). Also from the definitions, we find that for |x| > 1, G(x) = \frac{1}{\sqrt{x}} F\left(\frac{1}{x}\right), where \frac{1}{\sqrt{x}} denotes the holomorphic function \mathcal{S} \rightarrow \mathbb{C} (which is, in particular, continuous, i.e. free of branch cuts). Finally we know that for |Arg(x)| < \pi, these two definitions must patch together continuously.

Note how the branching has been resolved crudely: for every polar number x (outside a set of measure zero which does not disconnect \mathcal{S}) the integrand has acquired a definite value, continuous on circles. However, there is a circular branch cut on every other sheet.

Now let’s look back at the integrand. The meijerint code interprets \int \frac{1}{x \sqrt{1 - x^2}} as \frac{G(e^{i\pi} x^2)}{x}. [It’s e^{i\pi} and not e^{-i\pi} since -1 “in the wild” means e^{i\pi} in the standard branch. Of course one can specify this by hand if it is not the desired choice.] Hence for |x| > 1, the integrand is \frac{F\left(\frac{-1}{x^2} \right)}{x \sqrt{e^{i \pi}x^2} } = \frac{F\left(\frac{-1}{x^2} \right)}{x^2 e^{i \pi/2}} = \frac{-i}{x^2 \sqrt{1 - \frac{1}{x^2}}} (Recall that in the first term the the square root means the continuous function on \mathcal{S}, and so the second term is the same as the first. In the third term the square root denotes the principal branch, which is continuous throughout |x|>1 as well.)

Finally, for |x| < 1, the derivative of \arcsin{x} is \frac{1}{\sqrt{1 - x^2}}, again with the principal branch of the square root (for the same reason as before: continue analytically on circles from the real-valued function). Thus lo and behold, if we differentiate i\arcsin{\frac{1}{x}} for |x| > 1, we do get out the right sign. As a side note, if we replace -x^2 in the integrand by e^{-i\pi}x^2, we get the other sign.


Status Update—Week 8

In Uncategorized on July 16, 2011 by nessgrh


So this week I finally extended the integration lookup tables to now include trigonomic, hyperbolic and exponential functions, certain algebraic functions, error functions, the logarithm, the error function, and various bessel functions. I also did some miscellaneous fixes and extensions which I might mention in passing. Furthermore I decided to create a “TODO later” list: basically a list of things that I really really do want to fix, but that will not go into my first integrals pull request. For I have been working on this for so (comparatively) long, I feel I need to stabilise the branch. Indeed the only thing left to do before I subsmit it is a decent treatment of branched functions. After that I think progress can be incremential, and I will start working on the “TODO later” stuff in a gsoc-3 branch.

This post consists of three parts: the first shows results of this weeks work. The second describes some items on the “TODO later” list (to give an idea of what will be missing in the upcoming pull request). The third branch describes my ideas for polar numbers in detail, which I will start implementing tomorrow (I did not actually do any work today, instead I want to the Calypso Water park (near Ottawa). Hurray for flexible work hours.)

Visible Results of the Extended Tables

Basically now the code can solve the same sort of problems as before, just on many more classes of functions. One simple example is some indefinite integration. A few comments are in order. First of all, there is no use hiding that indefinite integration is hardly the strength of the algorithm. The examples show that it can be made to work in some situations, but basically any composition of functions or similar will stop it from working. (The same is true for definite integration, of course, but there nobody expects much more *g*.) We see in equations 4, 5, 7, 9 (sorry for the non-consecutive numbering) that integration of simple combinations of trigonometric and polynomial functions can be done, and can indeed be faster than existing code. Equations 12, 14, 15 show recursive mellin transforms in action, of course this gets much slower. Equations 18, 19, 20 and 38 show integration of bessel functions. Incidentially I could not verify the last by differentiation, so I will have to look into this again—before submitting the pull request, of course. Equations 40 and 41 illustrate integration of bessel functions. Equation 42 shows a slight shortcoming: the integral can be expressed using struve functions (I think), but they are just not in the table for expanding g-functions (and neither in sympy). Equation 43 shows a more serious limitation: the integral can be done using integration by parts, but the integrand is *not* a meijer g-function (mellin transform involves digamma functions), so this is not an integral the algorithm will be able to determine.

Let’s now look at some definite integration. There is not very much to say here, except that I show these integrals because they can be done. There are many more like them that can be done, and extremely many more like them that cannot be done. Some of these might look odd (23, 36); I will explain what is going on in the “TODO later” section.

The “TODO Later” list

One problem is that hyperexpand() sometimes is not very clever. For example in equation 28 of the definite integration examples, we see that apparently the integral cannot be done in named special functions. hyperexpand() retains the meijer g-function because the hypergeometric functions cannot be expanded; this is usually a good idea since a g-function expansion in hypergeometric functions can be very messy. However in this case the hypergeometric functions actually cancel! hyperexpand() note this and return the simplified result. However things are not quite so easy. Sometimes there is choice involved in what hypergeometric functions to expand into, and only some choices cancel. Moreover, sometimes the hypergeometric functions do not cancel, but instead can be evaluated at special arguments. Here are a few examples. Equation 3 shows some non-obvious cancellation: the gammas have to be simplified, and then the symmetry of the arguments of the hypergeometric functions has to be used. Equation five shows an example that can be done because of the special argument, indeed 2F1 can always be summed at unity, using a theorem of Gauss’. Equation 7 shows something more complicated still, I know it can be done but I am not sure how (maybe using a so-called quadratic transformation). One side-effect of this shortcoming is that currently I have had to put formulae for e.g. J_a(x) J_b(x) into the table, even though they should be derivable (at a time cost, of course). This is a bad thing since it leads to an explosion of entries.

Another thing that should be fixed is that the convergence conditions for the integral of two g-functions are sometimes just too weak. For example none of the following can currently be done: \int_0^\infty \sin{x} \cos{x} \frac{\mathrm{d}x}{\sqrt{x}}, \int_0^\infty e^{-x} \sinh{\frac{x}{2}} \mathrm{d}x, \int_0^\infty e^{-x} \sinh{x} \frac{\mathrm{d}x}{x\sqrt{x}}, \int_0^\infty e^x H(1-x) \mathrm{d}x. This also stops the recursive algorithm from working.

Then the tables can and should of course be extended further. Notably missing currently are inverse trigonometric functions and powers of trigonometric/hyperbolic functions (though these should be derivable).

As another point, the code to transform an integrand and limits into a known form is not currently very clever. For example \int_1^\infty \frac{\sin(zu)}{(u^2 - 1)^a} \mathrm{d}u can be done by rewriting it as \int_0^\infty H(u^2-1) \frac{\sin(zu)}{(u^2 - 1)^a} \mathrm{d}u, but the code cannot do this, yet.

On a related note, I should probably write a function to simplify bessel functions, e.g. equation 36 of the definite integration examples is just a single bessel I function (times a step function), but this is far from obvious. Importantly this needs proper treatment of polar numbers to get the branch factors right.

Introducing Polar Numbers

Finally, my new plan to treat branch cuts. I think there is no way around implementing proper polar numbers, in some way. Let’s first talk about the mathematics, and then about a possible implementation.

Let \mathcal{S} again denote the riemann surface of the logarithm, and let \mathbb{C} denote the ordinary complex line. [We shall sometimes think of \mathcal{S} as an open subset of a topological space S which contains one additional point called “origin”; notably there is no complex structure near it.] We have the holomorphic map Exp: \mathbb{C} \rightarrow \mathcal{S} which is indeed a biholomorphism. Its inverse is Log: \mathcal{S} \rightarrow \mathbb{C}. We also have a holomorphic projection map p: \mathcal{S} \rightarrow \mathbb{C}, Exp(z) \mapsto exp(z), where exp(z) denotes the ordinary exponential function from \mathbb{C} to itself. There is a section (one-sided inverse) of p related to the standard branch of the argument which we shall denote q: \mathbb{C} \rightarrow S, exp(k + i \theta) \mapsto Exp(k + i \theta), 0 \in \mathbb{C} \mapsto 0 \in S, where k is real and -\pi < \theta \le \pi. This function is not continuous at the non-positive reals, but holomorphic outside thereof.

Multiplication in \mathcal{S} is defined in the obvious way. If f: D \rightarrow \mathcal{S} is a meromorphic (say) function, then p \circ f: D \rightarrow \mathcal{C} is also meromorphic. Similarly if g: \mathbb{C} \rightarrow D is any meromorphic (say) function, then so is g \circ p: \mathcal{S} \rightarrow D. This allows us to extend many more functions from \mathbb{C} to \mathcal{S}, and this is often done implicitely. Let’s mention two peculiarites: the sum of two elements in \mathcal{S} cannot be defined very satisfactorily (although of course their sum in \mathbb{C} is well-defined and well-behaved). Also if a \in \mathcal{S}, b \in \mathbb{C} we can define a^b = Exp(b Log(a)), and if instead b \in \mathcal{S} we can define a^b = Exp(p(b) Log(a)). With these definitions all the usual algebraic relations (like (xy)^z = x^z y^z) hold for all x, y, z \in \mathcal{S}.

Now about the implementation. First we need a function to create/represent elements of \mathcal{S}. I had thought about exp(x, polar=True), polar being a new assumption. While I still think introducing an assumption is a good thing, I’m not so sure about this notation any more. The advantage is that the object looks very much like exp, so much code (printing, simplification) knows what to do with it. The disadvantage is that the assumption is easily lost, e.g. obj.func(*obj.args) will lose it, and also code might just create new exp objects, not knowing that there can be assumptions around. So probably creating a new function exp_polar is better.

Next I want to add an assumption polar which can be given to symbols. It does not interact much with other assumptions, except that positive shall imply polar. Much simplification code (notably in expand_log, powdenest, powsimp, logcombine) can now be updated to look for the polar assumption instead of the cruder positive.

Finally, for sake of usability, I propose the following functions: polarify(expr, subs=True/False) traverses the expression tree and turns any non-positive numeric constants (e.g. 1+I) into their image under q (namely sqrt(2)*exp_polar(I*pi/4)). It also converts instances of exp into exp_polar. If subs=True, also all symbols that are not polar will be substituted for polar dummies; thus polarify() behaves much like posify(). A function unpolarify(expr) that essentially computes the image of expr under p, in as simple a form as possible. Thus it converts the outermost instances of exp_polar into exp, and recursively does the same thing for arguments of unbranched functions (e.g. powers with integral exponents, or things like sin(x)—here probably the function class should give a hint, similar to nargs). Then functions like expand, powdenest can accept a new argument polar=True/False which works like force, except that it calls polarify instead of posify. Finally for convenience Expr should get a method extract_branch_factor (a wrapper around extract_multiplicatively I imagine) that extracts factors of the form exp_polar(I*pi*n), n nt integer, which allows branched functions to expose their branching behaviour.

This is about it; I feel I have spent forever writing this post (it’s actually just 2.5 hours :D).