Articles

Finding hypergeometric function expressions

In Uncategorized on May 7, 2011 by nessgrh

The core of the new code will be algorithms for expressing Hypergeometric and Meijer G functions in terms of known special functions. This is fairly complicated, so let me use this blog post to order my thoughts on the Hypergeometric case. The Meijer G case is fairly similar, there are just more indices to keep track of and more possible transformations, I’ll describe this in a later post.

Most of this post is based on the paper by Kelly Roach.

So first of all the Hypergeometric function is initially defined as a series: {}_p F_q\left({a_p \atop b_q} \middle| z \right) = \sum_{n=0}^\infty \frac{(a_1)_n \dots (a_p)_n}{(b_1)_n \dots (b_q)_n} \frac{z^n}{n!}, where a_p and b_q represent vectors of respectively p and q complex parameters, and (a)_n = a(a+1)\dots(a+n-1). This converges in some disc if p \le q+1 and none of the b_j is a negative integer (it also converges in some other cases if one of the a_i is a non-positive integer). Notice that the Hypergeometric function is symmetric in all numerator parameters (the a_p), and also in all denominator parameters (the b_q).

It turns out that there are certain differential operators that can change the a_p and p_q parameters by integers. If a sequence of such operators is known that converts the set of indices a_r^0 and b_s^0 into a_p and b_q, then we shall say the pair a_p, b_q is reachable from a_r^0, b_s^0. Our general strategy is thus as follows: given a set a_p, b_q of parameters, try to look up an origin a_r^0, b_s^0 for which we know an expression, and then apply the sequence of differential operators to the known expression to find an expression for the Hypergeometric function we are interested in.

Notation

In the following, the symbol a will always denote a numerator parameter and the symbol b will always denote a denominator parameter. The subscripts p, q, r, s denote vectors of that length, so e.g. a_p denotes a vector of p numerator parameters. The subscripts i and j denote “running indices”, so they should usually be used in conjuction with a “for all i“. E.g. a_i < 4 for all i. Uppercase subscripts I and J denote a chosen, fixed index. So for example a_I > 0 is true if the inequality holds for the one index I we are currently interested in.

Incrementing and decrementing indices

Suppose a_i \ne 0. Set A(a_i) = \frac{z}{a_i}\frac{\mathrm{d}}{dz}+1. It is then easy to show that A(a_i) {}_p F_q\left({a_p \atop b_q} \middle| z \right) = {}_p F_q\left({a_p + e_i \atop b_q} \middle| z \right), where e_i is the i-th unit vector. Similarly for b_j \ne 1 we set B(b_j) = \frac{z}{b_j-1} \frac{\mathrm{d}}{dz}+1 and find B(b_j) {}_p F_q\left({a_p \atop b_q} \middle| z \right) = {}_p F_q\left({a_p \atop b_q - e_i} \middle| z \right). Thus we can increment upper and decrement lower indices at will, as long as we don’t go through zero. The A(a_i) and B(b_j) are called shift operators.

It is also easy to show that \frac{\mathrm{d}}{dz} {}_p F_q\left({a_p \atop b_q} \middle| z \right) = \frac{a_1 \dots a_p}{b_1 \dots b_q} {}_p F_q\left({a_p + 1 \atop b_q + 1} \middle| z \right), where a_p + 1 is the vector a_1 + 1, a_2 + 1, \dots and similarly for b_q + 1. Combining this with the shift operators,  we arrive at one form of the Hypergeometric differential equation: \left[ \frac{\mathrm{d}}{dz} \prod_{j=1}^q B(b_j) - \frac{a_1 \dots a_p}{(b_1-1) \dots (b_q-1)} \prod_{i=1}^p A(a_i) \right] {}_p F_q\left({a_p \atop b_q} \middle| z \right) = 0. This holds if all shift operators are defined, i.e. if no a_i = 0 and no b_j = 1. Clearing denominators and multiplying through by z we arrive at the following equation: \left[ z\frac{\mathrm{d}}{dz} \prod_{j=1}^q \left(z\frac{\mathrm{d}}{dz} + b_j-1 \right) - z \prod_{i=1}^p \left( z\frac{\mathrm{d}}{\mathrm{d}z} + a_i \right) \right] {}_p F_q\left({a_p \atop b_q} \middle| z\right) = 0. Even though our derivation does not show it, it can be checked that this equation holds whenever the {}_p F_q is defined.

Notice that, under suitable conditions on a_I, b_J, each of the operators A(a_i), B(b_j) and z\frac{\mathrm{d}}{\mathrm{d}z} can be expressed in terms of A(a_I) or B(b_J). Our next aim is to write the Hypergeometric differential equation as follows: [X A(a_I) - b] {}_p F_q\left({a_p \atop b_q} \middle| z\right) = 0, for some operator X to be determined. If b \ne 0, then we can write this as \frac{-1}{b} X {}_p F_q\left({a_p + e_I \atop b_q} \middle| z\right) = {}_p F_q\left({a_p \atop b_q} \middle| z\right), and so \frac{-1}{b}X undoes the shifting of A(a_I), whence it will be called an inverse-shift operator.

Now A(a_I) exists if a_I \ne 0, and then z\frac{\mathrm{d}}{\mathrm{d}z} = a_I A(a_I) - a_I. Observe also that all the operators A(a_i), B(b_j) and z\frac{\mathrm{d}}{\mathrm{d}z} commute. We have \prod_{i=1}^p \left( z\frac{\mathrm{d}}{\mathrm{d}z} + a_i \right) = \left(\prod_{i=1, i \ne I}^p \left( z\frac{\mathrm{d}}{\mathrm{d}z} + a_i \right)\right) a_I A(a_I), so this gives us the first half of X. The other half does not have such a nice expression. We find  z\frac{\mathrm{d}}{dz} \prod_{j=1}^q \left(z\frac{\mathrm{d}}{dz} + b_j-1 \right) = \left(a_I A(a_I) - a_I\right) \prod_{j=1}^q \left(a_I A(a_I) - a_I + b_j - 1\right). Since the first half had no constant term, we infer b = -a_I\prod_{j=1}^q(b_j - 1 -a_I).

This tells us under which conditions we can “un-shift” A(a_I), namely when a_I \ne 0 and b \ne 0. Substituting a_I - 1 for a_I then tells us under what conditions we can decrement the index a_I. Doing a similar analysis for B(a_J), we arrive at the following rules:

  • An index a_I can be decremented if a_I \ne 1 and a_I \ne b_j for all b_j.
  • An index b_J can be incremented if b_J \ne -1 and b_J \ne a_i for all a_i.

Combined with the conditions (stated above) for the existence of shift operators, we have thus established the rules of the game!

Reduction of Order

Notice that, quite trivially, if a_I = b_J, we have {}_p F_q\left({a_p \atop b_q} \middle| z \right) = {}_{p-1} F_{q-1}\left({a_p^* \atop b_q^*} \middle| z \right), where a_p^* means a_p with a_I omitted, and similarly for b_q^*. We call this reduction of order.

In fact, we can do even better. If a_I - b_J \in \mathbb{Z}_{>0}, then it is easy to see that \frac{(a_I)_n}{(b_J)_n} is actually a polynomial in n. It is also easy to see that (z\frac{\mathrm{d}}{\mathrm{d}z})^k z^n = n^k z^n. Combining these two remarks we find:

  • If a_I - b_J \in \mathbb{Z}_{>0}, then there exists a polynomial p(n) = p_0 + p_1 n + \dots (of degree a_I - b_J) such that \frac{(a_I)_n}{(b_J)_n} = p(n) and {}_p F_q\left({a_p \atop b_q} \middle| z \right) = \left(p_0 + p_1 z\frac{\mathrm{d}}{\mathrm{d}z} + p_2 \left(z\frac{\mathrm{d}}{\mathrm{d}z}\right)^2 + \dots \right) {}_{p-1} F_{q-1}\left({a_p^* \atop b_q^*} \middle| z \right).

Thus any set of parameters a_p, b_q is reachable from a set of parameters c_r, d_s where c_i - d_j \in \mathbb{Z} implies c_i < d_j. Such a set of parameters c_r, d_s is called suitable. Our database of known formulae should only contain suitable origins. The reasons are twofold: firstly, working from suitable origins is easier, and secondly, a formula for a non-suitable origin can be deduced from a lower order formula, and we should put this one into the database instead.

Moving Around in the Parameter Space

It remains to investigate the following question: suppose a_p, b_q and a_p^0, b_q^0 are both suitable, and also a_i - a_i^0 \in \mathbb{Z}, b_j - b_j^0 \in \mathbb{Z}. When is a_p, b_q reachable from a_p^0, b_q^0? It is clear that we can treat all parameters independently that are incongruent mod 1. So assume that a_i and b_j are congruent to r mod 1, for all i and j. The same then follows for a_i^0 and b_j^0.

If r \ne 0, then any such a_p, b_q is reachable from any a_p^0, b_q^0. To see this notice that there exist constants c, c^0, congruent mod 1, such that a_i < c < b_j for all i and j, and similarly a_i^0 < c^0 < b_j^0. If n = c - c^0 > 0 then we first inverse-shift all the b_j^0 n times up, and then similarly shift shift up all the a_i^0 n times. If n < 0 then we first inverse-shift down the a_i^0 and then shift down the b_j^0. This reduces to the case c = c^0. But evidently we can now shift or inverse-shift around the a_i^0 arbitrarily so long as we keep them less than c, and similarly for the b_j^0 so long as we keep them bigger than c. Thus a_p, b_q is reachable from a_p^0, b_q^0.

If r = 0 then the problem is slightly more involved. WLOG no parameter is zero. We now have one additional complication: no parameter can ever move through zero. Hence a_p, b_q is reachable from a_p^0, b_q^0 if and only if the number of a_i < 0 equals the number of a_i^0 < 0, and similarly for the b_i and b_i^0. But in a suitable set of parameters, all b_j > 0! This is because the Hypergeometric function is undefined if one of the b_j is a non-positive integer and all a_i are smaller than the b_j. Hence the number of b_j \le 0 is always zero.

We can thus associate to every suitable set of parameters a_p, b_q, where no a_i = 0, the following invariants:

  • For every r \in [0, 1) the number \alpha_r of parameters a_i \equiv r \pmod{1}, and similarly the number \beta_r of parameters b_i \equiv r \pmod{1}.
  • The number \gamma of integers a_i with a_i < 0.

The above reasoning shows that a_p, b_q is reachable from a_p^0, b_q^0 if and only if the invariants \alpha_r, \beta_r, \gamma all agree. Thus in particular “being reachable from” is a symmetric relation on suitable parameters without zeros.

Applying the Operators

If all goes well then for a given set of parameters we find an origin in our database for which we have a nice formula. We now have to apply (potentially) many differential operators to it. If we do this blindly then the result will be very messy. This is because with Hypergeometric type functions, the derivative is usually expressed as a sum of two contiguous functions. Hence if we compute N derivatives, then the answer will involve 2N contiguous functions! This is clearly undesirable. In fact we know from the Hypergeometric differential equation that we need at most \max(p, q+1) contiguous functions to express all derivatives.

Hence instead of differentiating blindly, we will work with a \mathbb{C}(z)-module basis: for an origin a_r^0, b_s^0 we either store (for particularly pretty answers) or compute a set of N functions (typically N = \max(r, s+1)) with the property that the derivative of any of them is a \mathbb{C}(z)-linear combination of them. In formulae, we store a vector B of N functions, a matrix M and a vector C (the latter two with entries in \mathbb{C}(z)), with the following properties:

  • {}_r F_s\left({a_r^0 \atop b_s^0} \middle| z \right) = C B
  • \frac{\mathrm{d}}{\mathrm{d}z} B = M B.

Then we can compute as many derivatives as we want and we will always end up with \mathbb{C}(z)-linear combination of at most N special functions.

As hinted above, B, M and C can either all be stored (for particularly pretty answers) or computed from a single {}_p F_q formula.

Advertisements

6 Responses to “Finding hypergeometric function expressions”

  1. I’m a little confused what you mean by a_I and a_J with capital subscripts as opposed to lowercase subscripts.

  2. For whatever reason, 2N (2N) is rendering as 3- (3-) in the last paragraph (but the mouseover text says “2N”).

    So did I read it right that everything has to differ by integers? In other words, if you have 3/4 in your result, then you only need to look for items in the table that are congruent to 3/4 mod 1 for that index?

    This looks good. Very interesting. Now I’d like to read about how you plan to implement. Based on this, I’d say that just the MeijerG class should be non-trivial to implement (if it includes all these things as methods), so I’d like to see ideas how you plan to design it.

    • I think that latex thing is a WordPress bug. If I create a new post on my blog with the text from the first sentence and preview it, it says “For whatever reason, 2N (2N) is rendering as 3- (2N) in the last paragraph (but the mouseover text says “2N”).” Weird.

  3. I already commented on this, but it must have gotten lost. I am unclear on the difference between a_I and a_i in your notation (how are the capital indices different?).

  4. For some reason wordpress wants me to moderate every comment. Got to tweak the settings somehow…

    I will write about implementation in the next post(s).

    As for capitalised indices, there is not much of a difference. I’m mentally fixing the I and/or J, and then ask “can I increment index I on a”. I’ll try to make this clearer.

    • OK, that’s what I thought it was (lowercase implies for all), but it wasn’t clear.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: