Articles

Last Status Update (Probably)

In Uncategorized on August 12, 2011 by nessgrh

Time flies by … I feel like I just started hyperexpand() yesterday (not quite actually; but you get the idea *g*). I’ve been moving around this week, so I wasn’t terribly productive so far. I did, however, compute (reasonably) neat representations for all the {}_pF_{p-1} in the hyperexpand() table. These examples demonstrate the kind of neat answers this can yield (compare to some of the examples from last week) [actually the tables contain some much more complicated functions, but I was not yet able to come up with interesting integrals that yield these answers *g*].

My final plan, beyond mere cleanup and bugfixing, is to do something along the way of simplifying convergence conditions. This won’t be anything fancy (I’m thinking of peep-hole optimisation); my goal is to make mess like the following somewhat less horrific. I hope to have a final pull request ready by sunday night. Then I will finally slow down and relax; work towards merging my branches; perhaps review some other GSOC pull requests.

Quite a summer.

Articles

Branching, Once More

In Uncategorized on August 6, 2011 by nessgrh

As usual with my “out-of-line” posts, this one is again mostly to clear my own thoughts. It seeks to address the problem of how to get out “nice” answers, in the sense of e.g. finding a representation that is manifestly real (say to use inverse trig functions instead of logarithms with imaginary arguments). This turns out to be fairly delicate; let me explain why. Recall that when the code returns an expression like \sqrt{1 + e^{-i \pi} z}, we really mean a certain function on the riemann surface of the logarithm which we try to crudely express above. First of all, it is clear that if we want neat answers, we cannot just return this expression as stated. This is because the square-root function per se has lots of meanings, whereas we attach here a very specific meaning to it. There is no realiable way to recognise and transform such expressions later. So we really should return a new function object “MySqrt(z)” which has well-defined meaning for all polar z, and which can convert itself into a nice representation. The trouble is now that we have to very carefully define MySqrt(z) for all polar z, in such a way that all slater-expansions are correct. It is not (to me) immediately obvious that this is even possible (although it will turn out it is); note in particular that this function will necessarily be discontinuous. The first problem to solve then is how to deal with hypergeometric and meijer g-functions branched at other points than zero and infinity. After that we need to find a way to display things nicely.

Branching at +-1

We have the following facts about {}_pF_q(z):

  • unbranched if p \le q
  • branched at +1 if p = q + 1

And we have the following facts about G_{p,q}^{m,n}(z):

  • usually branched at 0, \infty
  • unbranched at +1 if \delta = p + q - (m + n)/2 > 0; in this case it is even analytic in a neighborhood of the positive reals
  • if p = q, branched at (-1)^{p - m - n} = (-1)^{q - m - n}
  • always has a polar lift that is continuous on “circles” (spirals) of radius not equal to one

Notice further that the Slater theorem (or a generalisation) can be used to come up with a nice (single) answer if \delta > 0. We thus assume now that p = q (so there is a problematic branch point) and \delta > 0 (so that we need to define the lift of the hypergeometric functions involved), and that a version of the slater theorem applies (else there is no work to do at all). As always, we let \mathcal{S} be the riemann surface of the logarithm, we understand G to be defined thereon (although it is not continuous).

We immediately find that G is branched at -1, thus G can be expressed as a sum of hypergeometric functions of argument -z or \frac{-1}{z}. Now given a hypergeometric function {}_pF_{p-1}: D \rightarrow \mathbb{C} (where D is the unit disc, the region of convergence of the power series representation), we wish to define a lift {}_pF_{p-1}: \mathcal{S} \rightarrow \mathbb{C} which makes all slater-expansions valid for all polar z (even though they were initially only valid for |z| < 1). Indeed I claim that there exists a unique f(z) lift satisfying the following list of properties:

  • f is defined on \mathcal{S} \setminus \{e^{i t} | t \in \mathbb{R}\} \cup \{e^{i t}, t \in (0, 2\pi)\}
  • if {}_pF_q does not have a pole at 1, then f is additionally defined on e^0, e^{2 \pi i}
  • f is locally analytic (that is, if f is defined at z_0 and U is an open neighbourhood of z_0 on which f is defined, then f is analytic on U)
  • f is a lift, i.e. if p:\mathcal{S} \rightarrow \mathbb{C} denotes the usual projection, then for all z \in p^{-1}(D) we have f(z) = {}_pF_q(p(z)).

Such a function has, in particular, the following properties:

  • f is continuous along the ray r e^{i\pi}, r > 0
  • it is continuous on “circles” (spirals)

From this it is easy to prove that if in all slater expansions the hypergeometric functions have argument e^{i \pi}z, then with the above lift the slater expansion holds for all polar z.

Finding nice representations

My idea is to create new objects to represent above-defined polar lifts, a la “MySqrt(z)”. hyperexpand() will work exclusively with these, except that in the end it (optionally) calls “rewrite(‘elementary’)”. The object then attempts to determine a “nice” representation (where what that means is up to the object, of course). I now have to sit down and compute said representations, and come up with a good way to test them. Sounds like a final project.

Articles

Status Update—Week 11

In Uncategorized on August 5, 2011 by nessgrh

The end of GSOC is rapidly approaching, so this was one further busy week. Here is what I did:

  • Fix some bugs in meijerint, uncovered by statistical integrals,
  • Optimise meijerint and hyperexpand,
  • Implement exponential integrals.

Before I go into details, let me stake out a final list of things I hope to do before the end of summer:

  1. Fix a test failure in meijerint. (explained below)
  2. Think about branching again. In particular, investigate how we can come up with “nice” answers.
  3. Investigate why hyperexpand results are sometimes ridiculously complicated.

Let’s now turn to what I did. First Matthew reported some bugs in my integration code, mostly related to how real variables are converted into polar ones (that is, they were not, which is wrong). I took the opportunity to test lots of integrals that seemed to be important for statistics, e.g. these, computing various moments of the normal, exponential and chi-squared distributions.

Then I started to optimise my code. First I created a script (sympy/benchmark/bench_meijerint.py) which basically times all the individual integrals from the tests. This allowed me to quickly isolate biggest performance problems. This is the output of the first run, and this is the output with my new optimisations. Let me go through the first few entries on both lists.

  • LT((t-apos)**bpos*exp(-cpos*(t-apos))*Heaviside(t-apos), t, s)

    This was my biggest worry, for two reasons. Firstly it seems fairly common—this is just a very basic laplace transform (though with many parameters). Second it is surprisingly hard—in order to do it, the algorithm has to expand the exponential, then bracket the two exponentials as G1, and the rest as G2. However, it turns out that a few tweaks allow us to recognise fairly quickly that the integral won’t be doable without expansion (because e^{c*t + d} is not in the tables), and then a few improvements to the heuristics make us try things in a reasonable order. Thus this laplace transform is comfortably down in the second list; from 4.8 to 0.5 seconds.

  • MT(bessely(a, sqrt(x))**2, x, s)

    and similar mellin transforms involving bessel functions or logarithms are slow for three reasons: (1) Hyperexpand(), is fairly slow to realise that a certain expansion cannot be done. This means that any “mis-step” the algorithm does is quite costly. (2) there tend to be confluent g-functions involved, for which we need series expansions (for computing residues), and these are slow. (3) Combsimp used to be slow. I improved combsimp, and I rewrote the residue function to use series expansions a bit more cleverly. I also improved hyperexpand a bit; the results are fairly noticable, but not as satisfying as for the laplace transform.

  • E((x+y-1)**2)

    This computes a certain statistical double integral in two ways, and it is not actually that slow. By my calculation the code winds up doing about 40 integrals (splitting integrals along the real line into two patrs, expanding the binomial, doing a double integral in two steps, and finally doing it all over again the other way round), putting the time for every single integral well into a sensible range.

I should mention that during this optimisations the path taken by meijerint was changed slightly, causing a test failure in test_meijerint which is actually fairly difficult to correct. So much for robust code…

Finally I implemented exponential integrals. This was quite good fun; let’s just look at a few examples. A few things can be said here. The answers are sometimes messy; often expand() already yields a lot of cancellation. The noticable exception is the integral representation of the generalised exponential integral with two parameters, here more tricks are necessary (but I don’t really think there is any way to know in advance that expressing everything in terms of expint yields such a nice answer, short from recognising the integral). At one stage I’m using a polar variable, this is to allow \log(u^2) to expand (alternatively one could safely use force here). The laplace transforms are a nice example where answers come out much messier than they have to (the other exponential integrals are mostly as bad es Si, noticable exception being e1). Some of the definite integrals illustrate a similar problem.

Articles

Status Update—Week 10

In Uncategorized on July 28, 2011 by nessgrh

This is a little earlier than my usual status update, but I think now is a good time. I just commited some new code, and many new interesting things work now, hence it is a good time to show off :-). On the other hand there a quite a number of bugs in my code, so that’s probably what I will be spending next week on…

Anyway. First I added lerch phi to hyperexpand. Here are a few examples. There is not a lot to be said here, but it is good to see many common sums to work now.

Next I improved hyperexpand() to handle some expansions at “special points”. This means evaluating hypergeometric (or meijer g) functions at say z=1, even if we don’t know any closed-form expressions for general z. There is  a vast literature on such “hypergeometric identities”, and my code is a very humble start at best. It basically just implements Gauss’ and Kummer’s summation theorems for 2F1 and nothing else, but this is fairly effective. Then I improved one of the convergence conditions—it turns out that in addition to what is listed on the wolfram functions site, in the russian book from which they took the conditions there is a crucial extra part. After finding someone to translate it to me I could implement this; now we can do some more integrals. The upshot of this is that the mellin transform of a product of bessel functions can now be derived by the code, instead of having to put it into the table.

Let me put this into perspective. There are (at least) the functions J, Y, I, K. The general products J_a J_b, Y_a Y_b, J_a Y_b can be exrpessed as g-functions, and similarly for I, K. Also many special products can be expressed. I had previously put in a few of the identities of the first kind. Now almost all of them can be derived from just the entries for single functions. (There are some problems with functions of the second kind, which tend to be singular and/or rapidly growing, so that they don’t really have mellin transforms; deriving formulae for these is difficult in the current setup). On the other hand, this can be somewhat slow; for this reason I only commented out the formulae instead of removing them. Here are a few timings. These are evidently not great, I’ll have to see what can be done. My guess is that hyperexpand() is relatively slow, but I haven’t looked into this further. [Note also that running this with cache off is much slower, since the algorithm internally uses caching.]

Finally, I improved the integration heuristics so as to be able to do some more integrals [with a few additional factors the last integral is a representation of besselj]. Again I don’t know what makes this so slow.

In closing, let’s look at some more fun definite integrals (all played around with long enough until I found a variation that can be done in closed form 😉 ). Again not much to say here (except that (9) shows a bug in the hyperepand table, the minus sign must be e^{-i\pi}); the numerical computations are for comparison with wolfram alpha.

Articles

Status Update—Week 9

In Uncategorized on July 22, 2011 by nessgrh

This was one busy week again. I implemented polar numbers and changed over the integration and hyperexpand code to use them. This was more painful than I thought, but it seems to work now. Indeed the problems I mentioned before (other than matching) are gone now, as are the hacks. After this I cleaned up my branch to such an extent that I considered it ready for review. There are still some minor issues regarding numerical evaluation that I’m working to sort out, but this shouldn’t affect the review much.

Then I started working on adding lerchphi and polylogarithms to sympy. The goal of this is to incorporate them into hyperexpand(), so that a few more interesting series can be summed. This is good fun. Here are a few exampes. As you can see, all the standard things that one expects to work do work. And expand_func() can be used to reduce lerchphi to polylogarithms. In fact it can also reduce to hurwitz zeta functions in some cases but that is a mess. However, it is correct (tested numerically) and in a specific sense even simpler. In any case it’s nice to have it, even if it is not used much :-).

I also started extending hyperexpand to recognise lerch phi … this is slightly non-trivial (compared to normal table extensions) because lerch phi is actually not hypergeometric unless the parameter s is an integer, and even then the number of parameters of the hypergeometric function depends on s. Thus we need a special function to recognise such hypergeometric functions and generate formulae on the fly, this is what I am working on now. It will be finished on monday (so I have something cool to show off next week again *g*).

Finally it turns out that there are some subtle bugs in random numeric testing for hyperexpand. Since the code is in master now and we are about to release this is fairly bad of course. But luckily another gsoc student is setting up jenkins and configured it in such a way to currently run the relevant tests every five minutes. That should allow me to weed out all bugs.

Articles

Deciphering Branch Behaviour

In Uncategorized on July 22, 2011 by nessgrh

Handling branches in computer systems is indeed unbelievably subtle. Here is one slightly complicated example which comes out correctly, at least when looked at in the right way; this is encouraging. The following integral comes up in an ODE test: \int \frac{\mathrm{d}x}{x \sqrt{1 - x^2}} = i \arcsin{\frac{1}{x}}. The right hand side is as computed by the meijerint code and looks suspicious, not being invariant under complex conjugation. However, note that on both sides are branched functions: \sqrt{1 - x^2} and \arcsin{\frac{1}{x}}. The left hand side is unbranched at the origin, whereas the right hand side is unbranched at infinity. The left hand side is branched at infinity, whereas the rigth hand side is not. (There is nothing unusual about this, there is no reason to expect indefinite integrals to retain branchpoints.)

Thus in order to understand what is going on, we first have to understand the extension of both functions to true polar numbers x, because this is what the meijer g code works with. Since inverse sine is surely nasty, let’s try to understand the square root expression. A clue comes from enabling debug output: the meijerint code decides \frac{1}{\sqrt{1-x^2}} = G(x^2 e^{i \pi}) = \sqrt{\pi} G_{1, 1}^{1, 1} \left(\begin{matrix}\frac{1}{2} \\ 0 \end{matrix}\middle| x^2 e^{i\pi} \right). From the definitions, for |x| < 1 we find G(x) = F(x), where F(x), which we shall define only for |x| < 1, is (1 + x)^{-\frac{1}{2}}, evaluated on the principal branch (indeed it is just a binomial series). Also from the definitions, we find that for |x| > 1, G(x) = \frac{1}{\sqrt{x}} F\left(\frac{1}{x}\right), where \frac{1}{\sqrt{x}} denotes the holomorphic function \mathcal{S} \rightarrow \mathbb{C} (which is, in particular, continuous, i.e. free of branch cuts). Finally we know that for |Arg(x)| < \pi, these two definitions must patch together continuously.

Note how the branching has been resolved crudely: for every polar number x (outside a set of measure zero which does not disconnect \mathcal{S}) the integrand has acquired a definite value, continuous on circles. However, there is a circular branch cut on every other sheet.

Now let’s look back at the integrand. The meijerint code interprets \int \frac{1}{x \sqrt{1 - x^2}} as \frac{G(e^{i\pi} x^2)}{x}. [It’s e^{i\pi} and not e^{-i\pi} since -1 “in the wild” means e^{i\pi} in the standard branch. Of course one can specify this by hand if it is not the desired choice.] Hence for |x| > 1, the integrand is \frac{F\left(\frac{-1}{x^2} \right)}{x \sqrt{e^{i \pi}x^2} } = \frac{F\left(\frac{-1}{x^2} \right)}{x^2 e^{i \pi/2}} = \frac{-i}{x^2 \sqrt{1 - \frac{1}{x^2}}} (Recall that in the first term the the square root means the continuous function on \mathcal{S}, and so the second term is the same as the first. In the third term the square root denotes the principal branch, which is continuous throughout |x|>1 as well.)

Finally, for |x| < 1, the derivative of \arcsin{x} is \frac{1}{\sqrt{1 - x^2}}, again with the principal branch of the square root (for the same reason as before: continue analytically on circles from the real-valued function). Thus lo and behold, if we differentiate i\arcsin{\frac{1}{x}} for |x| > 1, we do get out the right sign. As a side note, if we replace -x^2 in the integrand by e^{-i\pi}x^2, we get the other sign.

Articles

Status Update—Week 8

In Uncategorized on July 16, 2011 by nessgrh

Introduction

So this week I finally extended the integration lookup tables to now include trigonomic, hyperbolic and exponential functions, certain algebraic functions, error functions, the logarithm, the error function, and various bessel functions. I also did some miscellaneous fixes and extensions which I might mention in passing. Furthermore I decided to create a “TODO later” list: basically a list of things that I really really do want to fix, but that will not go into my first integrals pull request. For I have been working on this for so (comparatively) long, I feel I need to stabilise the branch. Indeed the only thing left to do before I subsmit it is a decent treatment of branched functions. After that I think progress can be incremential, and I will start working on the “TODO later” stuff in a gsoc-3 branch.

This post consists of three parts: the first shows results of this weeks work. The second describes some items on the “TODO later” list (to give an idea of what will be missing in the upcoming pull request). The third branch describes my ideas for polar numbers in detail, which I will start implementing tomorrow (I did not actually do any work today, instead I want to the Calypso Water park (near Ottawa). Hurray for flexible work hours.)

Visible Results of the Extended Tables

Basically now the code can solve the same sort of problems as before, just on many more classes of functions. One simple example is some indefinite integration. A few comments are in order. First of all, there is no use hiding that indefinite integration is hardly the strength of the algorithm. The examples show that it can be made to work in some situations, but basically any composition of functions or similar will stop it from working. (The same is true for definite integration, of course, but there nobody expects much more *g*.) We see in equations 4, 5, 7, 9 (sorry for the non-consecutive numbering) that integration of simple combinations of trigonometric and polynomial functions can be done, and can indeed be faster than existing code. Equations 12, 14, 15 show recursive mellin transforms in action, of course this gets much slower. Equations 18, 19, 20 and 38 show integration of bessel functions. Incidentially I could not verify the last by differentiation, so I will have to look into this again—before submitting the pull request, of course. Equations 40 and 41 illustrate integration of bessel functions. Equation 42 shows a slight shortcoming: the integral can be expressed using struve functions (I think), but they are just not in the table for expanding g-functions (and neither in sympy). Equation 43 shows a more serious limitation: the integral can be done using integration by parts, but the integrand is *not* a meijer g-function (mellin transform involves digamma functions), so this is not an integral the algorithm will be able to determine.

Let’s now look at some definite integration. There is not very much to say here, except that I show these integrals because they can be done. There are many more like them that can be done, and extremely many more like them that cannot be done. Some of these might look odd (23, 36); I will explain what is going on in the “TODO later” section.

The “TODO Later” list

One problem is that hyperexpand() sometimes is not very clever. For example in equation 28 of the definite integration examples, we see that apparently the integral cannot be done in named special functions. hyperexpand() retains the meijer g-function because the hypergeometric functions cannot be expanded; this is usually a good idea since a g-function expansion in hypergeometric functions can be very messy. However in this case the hypergeometric functions actually cancel! hyperexpand() note this and return the simplified result. However things are not quite so easy. Sometimes there is choice involved in what hypergeometric functions to expand into, and only some choices cancel. Moreover, sometimes the hypergeometric functions do not cancel, but instead can be evaluated at special arguments. Here are a few examples. Equation 3 shows some non-obvious cancellation: the gammas have to be simplified, and then the symmetry of the arguments of the hypergeometric functions has to be used. Equation five shows an example that can be done because of the special argument, indeed 2F1 can always be summed at unity, using a theorem of Gauss’. Equation 7 shows something more complicated still, I know it can be done but I am not sure how (maybe using a so-called quadratic transformation). One side-effect of this shortcoming is that currently I have had to put formulae for e.g. J_a(x) J_b(x) into the table, even though they should be derivable (at a time cost, of course). This is a bad thing since it leads to an explosion of entries.

Another thing that should be fixed is that the convergence conditions for the integral of two g-functions are sometimes just too weak. For example none of the following can currently be done: \int_0^\infty \sin{x} \cos{x} \frac{\mathrm{d}x}{\sqrt{x}}, \int_0^\infty e^{-x} \sinh{\frac{x}{2}} \mathrm{d}x, \int_0^\infty e^{-x} \sinh{x} \frac{\mathrm{d}x}{x\sqrt{x}}, \int_0^\infty e^x H(1-x) \mathrm{d}x. This also stops the recursive algorithm from working.

Then the tables can and should of course be extended further. Notably missing currently are inverse trigonometric functions and powers of trigonometric/hyperbolic functions (though these should be derivable).

As another point, the code to transform an integrand and limits into a known form is not currently very clever. For example \int_1^\infty \frac{\sin(zu)}{(u^2 - 1)^a} \mathrm{d}u can be done by rewriting it as \int_0^\infty H(u^2-1) \frac{\sin(zu)}{(u^2 - 1)^a} \mathrm{d}u, but the code cannot do this, yet.

On a related note, I should probably write a function to simplify bessel functions, e.g. equation 36 of the definite integration examples is just a single bessel I function (times a step function), but this is far from obvious. Importantly this needs proper treatment of polar numbers to get the branch factors right.

Introducing Polar Numbers

Finally, my new plan to treat branch cuts. I think there is no way around implementing proper polar numbers, in some way. Let’s first talk about the mathematics, and then about a possible implementation.

Let \mathcal{S} again denote the riemann surface of the logarithm, and let \mathbb{C} denote the ordinary complex line. [We shall sometimes think of \mathcal{S} as an open subset of a topological space S which contains one additional point called “origin”; notably there is no complex structure near it.] We have the holomorphic map Exp: \mathbb{C} \rightarrow \mathcal{S} which is indeed a biholomorphism. Its inverse is Log: \mathcal{S} \rightarrow \mathbb{C}. We also have a holomorphic projection map p: \mathcal{S} \rightarrow \mathbb{C}, Exp(z) \mapsto exp(z), where exp(z) denotes the ordinary exponential function from \mathbb{C} to itself. There is a section (one-sided inverse) of p related to the standard branch of the argument which we shall denote q: \mathbb{C} \rightarrow S, exp(k + i \theta) \mapsto Exp(k + i \theta), 0 \in \mathbb{C} \mapsto 0 \in S, where k is real and -\pi < \theta \le \pi. This function is not continuous at the non-positive reals, but holomorphic outside thereof.

Multiplication in \mathcal{S} is defined in the obvious way. If f: D \rightarrow \mathcal{S} is a meromorphic (say) function, then p \circ f: D \rightarrow \mathcal{C} is also meromorphic. Similarly if g: \mathbb{C} \rightarrow D is any meromorphic (say) function, then so is g \circ p: \mathcal{S} \rightarrow D. This allows us to extend many more functions from \mathbb{C} to \mathcal{S}, and this is often done implicitely. Let’s mention two peculiarites: the sum of two elements in \mathcal{S} cannot be defined very satisfactorily (although of course their sum in \mathbb{C} is well-defined and well-behaved). Also if a \in \mathcal{S}, b \in \mathbb{C} we can define a^b = Exp(b Log(a)), and if instead b \in \mathcal{S} we can define a^b = Exp(p(b) Log(a)). With these definitions all the usual algebraic relations (like (xy)^z = x^z y^z) hold for all x, y, z \in \mathcal{S}.

Now about the implementation. First we need a function to create/represent elements of \mathcal{S}. I had thought about exp(x, polar=True), polar being a new assumption. While I still think introducing an assumption is a good thing, I’m not so sure about this notation any more. The advantage is that the object looks very much like exp, so much code (printing, simplification) knows what to do with it. The disadvantage is that the assumption is easily lost, e.g. obj.func(*obj.args) will lose it, and also code might just create new exp objects, not knowing that there can be assumptions around. So probably creating a new function exp_polar is better.

Next I want to add an assumption polar which can be given to symbols. It does not interact much with other assumptions, except that positive shall imply polar. Much simplification code (notably in expand_log, powdenest, powsimp, logcombine) can now be updated to look for the polar assumption instead of the cruder positive.

Finally, for sake of usability, I propose the following functions: polarify(expr, subs=True/False) traverses the expression tree and turns any non-positive numeric constants (e.g. 1+I) into their image under q (namely sqrt(2)*exp_polar(I*pi/4)). It also converts instances of exp into exp_polar. If subs=True, also all symbols that are not polar will be substituted for polar dummies; thus polarify() behaves much like posify(). A function unpolarify(expr) that essentially computes the image of expr under p, in as simple a form as possible. Thus it converts the outermost instances of exp_polar into exp, and recursively does the same thing for arguments of unbranched functions (e.g. powers with integral exponents, or things like sin(x)—here probably the function class should give a hint, similar to nargs). Then functions like expand, powdenest can accept a new argument polar=True/False which works like force, except that it calls polarify instead of posify. Finally for convenience Expr should get a method extract_branch_factor (a wrapper around extract_multiplicatively I imagine) that extracts factors of the form exp_polar(I*pi*n), n nt integer, which allows branched functions to expose their branching behaviour.

This is about it; I feel I have spent forever writing this post (it’s actually just 2.5 hours :D).

Articles

A question of the argument—or: Tricky branch cuts revisited

In Uncategorized on July 13, 2011 by nessgrh

This post is a continuation of my previous one on branch cuts. My usual status update will again come on friday, and this time I actually have something to show off.

As before, the aim of this post is two-fold: both to clarify my thoughts, and to invite suggestions for solution.

It turn out my previous treatment of branched functions was a bit short-sighted (who would have guessed that, hu? :D). In order to make progress, I have for now been ignoring problems related to branches/arguments; but by now this is the last big problem before my branch can finally become “pull request ready” (more on that in friday’s post). Let me first describe the problem(s) at hand. I have picked five (six) representative examples.

  1. IFT\left[\frac{1}{1 + 2\pi i k}\right](x), x > 0. (IFT stands for inverse fourier transform) As explained previously, this comes out as (roughly) e^{-x} \left(\Gamma(0, e^{2 \pi i}) - \Gamma(0, 1) \right). The e^{2 \pi i} must not be simplified to 1 here.
  2. IFT\left[ \frac{1}{1 + 2\pi i k} \right](x), x < 0. As also explained previously, the G-function algorithm cannot compute this without realising that exp(x) is unbranched, i.e. that the argument of this function may indeed be converted to (-\pi, \pi].
  3. MT\left[\frac{(b + \sqrt{b^2 + x^2})^a}{-\sqrt{b^2 + x^2}} \right], \text{where } b < 0. (MT stands for mellin transform) This is curious. To understand what is going on, we have to realise that the function is actually represented as C F(x^2/b^2) for some constant C independent of x, and a function F(z). Thus b and x together determine the branch of the square root taken. Thus putting a negative b in above expression yields \frac{(b - \sqrt{b^2 + x^2})^a}{-\sqrt{b^2 + x^2}}, where \sqrt{z} for once denotes the principal branch. This is surely (?) not what the user had in mind when entering the above expression (and btw we cannot compute the answer for this argument anyway).
  4. \sqrt{x^2 + t^2} + t matches \sqrt{x^2 + b^2} - b with t = -b. This is wrong for the same reason as the example above, but this time the problem is that matching code thinks (-1)^2 = 1.
  5. MT \left[e^{ax} \sin{ax} \right]. This mellin transform only exists for a in the left half plane. The unsimplified result has a term (roughly) like (e^{2 \pi i}a)^s, except that it is more complicated so that powdenest() doesn’t put the s into the exponent. This only happens when calling simplify. The end result is that when we substitute the real \pi for the dummy an important factor of e^{2 \pi i} is lost. (In case this explanation is too confusing, look at this session log. Crucially, the last equation has a factor of e^{-2\pi i s} right before the gamma function, whereas the first does not.)
  6. ILT \frac{a^\nu (s + \sqrt{s^2-a^2})^{-\nu}}{\sqrt{s^2 - a^2}}. (ILT stands for inverse laplace transform) In this case a double minus sign is cancelled, for similar reasons as above. (Here is a log.)

I think the following conclusions can be drawn from these examples:

  1. powdenest(force=True) is not a reliable way to propagate exponents and thus not sufficient to be sure that we can safely substitute \pi for a dummy. [It has other problems as well, e.g. powdenest(abs(x), force=True) == x, and that’s probably not even a bug.]
  2. It is (thus) not tenable to try to hide all branch computations from the user in the way I tried so far (i.e. to return a result with fully computed branches, where -1-s can be cancelled in etc).
  3. There needs to be a function similar to arg() and unbranched_argument() that can return reduced arguments if this is known to be correct.
  4. It would be desirable to to have functions somewhat like powdenest(force=True) that do all simplifications valid on the riemann surface of the logarithm. Note that fore=True works by pretending all symbols are positive, this is not necessarily the same thing.

I’m looking for an unintrusive way to solve the above four problems.

I’m toying with the idea of adding a new assumption “polar” to sympy. I think almost no code needs to know about it, and it seems like it could solve my problems. I’m not very experienced with the sympy core, though, so I may be wrong. I imagine that this assumption can be set on various objects to make them behave like numbers on the riemann surface of the logarithm. In particular exp(z, polar=True) stops exp(2*pi*I) == 1. We could also allow this on e.g. I or -1, so that exp(pi*I, polar=True) can return -1 with the polar=True flag set. Multiplying two such factors together will propagate back to exp. Also polar=True should stop exp from being expanded trigonometrically (since adding polar numbers does not really make sense, as far as I can see—adding shouldn’t be disallowed though, since it can be useful formal notation). Matching code could propagate this flag (I’m fuzzy on the details). Similarly powdenest etc could watch out for it.

This solves all but (3), for which I could just make a new function. My existing machinery for handling branch cuts should be adapted, obviously (actually most of it could go now). argument_dummy can go of course.

It might even be advantageous to have an assumption “period” which can have a numerical value (where the old behaviour means period=None, and the above behaviour means period=oo); but I think assumptions must be flags so this cannot work. If it would, it might allow for a unified treatment of (1-4), but that’s probably not worth it.

Articles

Status Update—Week 7

In Uncategorized on July 8, 2011 by nessgrh

I’m starting to get into a phase were results of my new code are not as impressive as one might wish. Basically I hit one obscure integral that doesn’t work, and then I spend hours hunting down some bugs to fix it. These bugs can be minor typos or rather deep conceptual problems (see the branch cuts post), but nonetheless they are not very user-visible.

In any case, this week I fixed the combsimp algorithm for gamma functions, I implemented most of the expansion algorithm for G functions, I added classes to represent unevaluated transforms (as suggested in a comment to my last blog post), and I just about started to to extend the lookup tables. The last part turns out to be a real stress test for my code (since obviously we want to derive the formulas wherever possible, instead of putting them all into tables).

As I said before, there is not a lot to show off. Perhaps one word of clarification: there was code to expand meijer g-functions all the time. It works by evaluating the defining integral(s) using the residue theorem, i.e. using the (generalised) slater theorem. However, in some cases this does not work, because the series are properly non-hypergeometric. This can be quite annoying. Consider, for example G_{1, 2}^{2, 1}\left( y \atop {0, y} \middle| x \right) = e^{x} \Gamma\left(1 - y \right) \Gamma(y, x). This formula holds for all y that are not positive integers (that is for all y so that the G-function on the left hand side makes sense). But for negative integers, slater’s theorem does not apply! This is quite a shame, since obviously substituting them in the end result is no problem at all, there does not seem to be a limit involved. But this is not the case, there is some gamma function cancellation going on. In fact, whenever Slater’s theorem applies, we know that the G-function is a sum of terms of the form z^a F(z), where F(z) is unbranched at the origin. But \Gamma(-n, z) has logarithmic monodromy at the origin. Thus there really is a limiting process.

Hence I spent about 1.5 days developing this new code. And then another half a day thinking about the branch cuts. Let’s see how much longer it is going to take until there is a satisfactory solution ^^.

Articles

Tricky branch cuts

In Uncategorized on July 7, 2011 by nessgrh

This post will elaborate on the recent problems I’m having with “multivalued” functions. My usual status update will come tomorrow. I shall try to be almost pedantically clear in my description, both in order to clear my own thoughts and to increase the chance of someone suggesting a way out. In what follows, \mathcal{S} will denote the riemann surface of the logarithm (this only means infinitely many copies of the complex line, glued together approprietely so that \log{z} becomes an entire, single-valued function on \mathcal{S}). Zero is not an element of \mathcal{S}.

The problem we will study is how to compute the inverse fourier transform f(t) = \int_{-\infty}^\infty e^{i t x} \frac{\mathrm{d}x}{i x + 1}. First of all it is easy to see that the integral converges only for real, non-zero t. Also using contour integration it is easily established that f(t) = 2\pi H(t) e^{-t}, where H(t) denotes the Heaviside step function.

How does the Meijer G-function algorithm compute this integral? First of all it is split up into two parts, I_1 = \int_0^\infty e^{i t x} \frac{\mathrm{d}x}{i x + 1} and I_2 = \int_{-\infty}^0 e^{i t x} \frac{\mathrm{d}x}{i x + 1}. Let’s concentrate on I_1 for now. The integral exists if the exponential has a decaying component, or if it is oscillatory. That is to say it converges for Im(t) \ge 0, t \ne 0. But let’s be a bit more specific: all meijer G functions are defined on \mathcal{S}, and may or may not descend to \mathbb{C}. Thus define I_1: \mathcal{S} \supset D \rightarrow \mathbb{C} to be our integral (where D denotes the region of convergence). Of course I_1(z) = I_1(e^{2\pi i} z) when the integral converges (note that this equation is not vacuous: e^{2\pi i} \ne 1 on \mathcal{S}). We thus find:

The integral I_1(t) converges if (2n-1) \pi \le Arg(t) \le 2n \pi, for some integer n.

Having established this, let’s see what the algorithm computes. I should explain the output. Note that debugging output is enabled. The first piecewise result is what would ordinarily be presented to the user: there is a closed expression, holding under various conditions. And there is the “otherwise” part of the piecewise, which just says that if the conditions do not hold, we don’t know the answer. Consider the conditions. unbranched_argument(t) means the argument of t \in \mathcal{S} (whereas arg(t) would mean some value between -\pi and \pi, and that function is not continuous either). Thus the conditions are just a funny way of saying 0 \le Arg(t) \le \pi. Now consider the claimed result. It looks a bit daunting, but this is because the functions in which it is written are defined on \mathbb{C}, not \mathcal{S}. The debugging output line “Result before branch substitutions …” shows what the result would be on \mathcal{S}. I have computed this in output [8] (notice that what looks like \pi here is a dummy, to stop evaluation of e^{\pi i} etc). Of course \exp(z) is an unbranched function on \mathbb{C}, so the first two factors are uninteresting. But the last factor is not: we see that the result is related to the upper incomplete gamma function. This is a branched function, i.e. it is properly defined on \mathcal{S} and it does not descend to \mathbb{C}. If t is a positive real number, then the incomplete gamma function is to be evaluated at argument -\pi. This is where the extra terms (like argument_period()) come from: on \mathbb{C} argument -\pi means negative reals. But with the branch cut conventions as implemented (in mpmath, effectively, but also enforced throughout sympy), negative reals mean argument +\pi. So the answer cannot just be \Gamma(0, -t), because this would end up on the wrong branch. Thus we conclude (or rather the algorithm did for us):

Define g: \mathcal{S} \rightarrow \mathbb{C} by g(z) = -i e^{-t} \Gamma(0, e^{-i \pi} t). Then for 0 \le Arg(t) \le \pi, I_1(t) = g(t).

Notice in particular that our original integral has two natural extensions to \mathcal{S}: the naive definition I_1(t), and the analytic continuation g(t). The algorithm has to return some subset of the first definition, and not the second. In a sense this is why the convergence condition is necessary (although technically it is there because the integral representation of the G-functions used to establish this formula diverges outside the region specified by the convergence conditions).

We could now do a similar analysis for I_2. The story is essentially the same, with the upper half plane replaced by the lower half plane (-\pi \le Arg(t) \le 0) throughout. So when can we compute the combined integral? We already noted that this should be the case exactly when t is real, and working in \mathbb{C} this is easy to see: the intersection of the (closed) upper and lower half planes is the real axis. But notice how this fails on \mathcal{S}. The intersection of 0 \le Arg(t) \le \pi (where we can evaluate the first integral) and -\pi \le Arg(t) \le(t) is Arg(t) = 0, i.e. only positive t.

To further amplify the confusion, computing said fourier transform with argument -t, where t is declared positive, works. This is because before “switching” to working on \mathcal{S}, the algorithm carelessly cancels some minus signs … which is fine for entire functions, of course.

Here ends this rather lengthy explanation. It remains to figure out how we can teach the algorithm that the integrals are periodic in argument 2\pi, instead of knowing them only on one half-sheet. I have some vague ideas for this, but input is appreciated. In any case, I’ll think about this for a while. For now there is enough other work to do.