Appendix D
Complex Analysis and the Central Limit Theorem

Contents

1 Complex Analysis and the Central Limit Theorem
 1.1 Warnings from real analysis
 1.2 Complex Analysis and Topology Definitions
 1.3 Complex analysis and moment generating functions
 1.4 Exercises

 

One of the greatest challenges in a course is determining what level to pitch it. This is perhaps most apparent in deciding what level of detail to give for proofs. For us, the most important result is, as the name suggests, the Central Limit Theorem. The purpose of this chapter is to quickly introduce you to a subject which is beautiful and important in its own right, Complex Analysis, and see how it connects to Probability and the Central Limit Theorem.

Chapter 1
Complex Analysis and the Central Limit Theorem

In Chapter 20 we gave a proof of the Central Limit Theorem using generating functions; unfortunately that proof isn’t complete as it assumed some results from Complex Analysis. Moreover, we had to assume the moment generating function existed, which isn’t always true.

We tried again in Chapter 21; we proved the Central Limit Theorem by using Fourier analysis. Instead of using the moment generating function, which can fail to even exist, this time we used the Fourier transform (also called the characteristic function), which has the very nice and useful property of actually existing! Unfortunately, here too we needed to appeal to some results from Complex Analysis.

This leaves us in a quandary, where we have a few options.

1.
We can just accept as true some results from Complex Analysis and move on.
2.
We can try and find yet another proof, this time one that doesn’t need Complex Analysis.
3.
We can drop everything and take a crash course in Complex Analysis.

This chapter is for those who like the third option. We’ll explain some of the key ideas of complex analysis, in particular we’ll show why it’s such a different subject than real analysis. Obviously, it helps to have seen real analysis, but if you’re comfortable with Taylor series and basic results on convergence you’ll be fine.

It turns out that assuming a function of a real variable is differentiable doesn’t mean too much, but assume a function of a complex variable is differentiable and all of a sudden doors are opening everywhere with additional, powerful facts that must be true. Obviously this chapter can’t replace an entire course, nor is that our goal. We want to show you some of the key ideas of this beautiful subject, and hopefully when you finish reading you’ll have a better sense of why the black-box results from Complex Analysis (Theorems 20.5.3 and 20.5.4) are true.

This chapter is meant to supplement our discussions on moment generating functions and proofs of the Central Limit Theorem. We thus assume the reader is familiar with the notation and concepts from Chapters 19 through 21.

1.1 Warnings from real analysis

The following example is one of my favorites from real analysis. It indicates why real analysis is hard, almost surely much harder than you might expect. Consider the function \(g:ℝ\to ℝ\) given by\begin{cases}\tag{D.1}g(x) = e^{-1/x^2}&\text{if }x≠0\\0&\text{otherwise.}\end {cases} Using the definition of the derivative and L’Hopital’s rule, we can show that \(g\) is infinitely differentiable, and all of its derivatives at the origin vanish. For example,

\begin {eqnarray*} g'(0) & \ = \ & \lim _{h\to 0} \frac {e^{-1/h^2} - 0}{h} \nonumber \\ & = & \lim _{h\to 0} \frac {1/h}{e^{1/h^2}} \nonumber \\ &=& \lim _{k \to \infty } \frac {k}{e^{k^2}} \nonumber \\ &=& \lim _{k\to \infty } \frac {1}{2k e^{k^2}} \ = \ 0, \end {eqnarray*}

where we used L’Hopital’s rule in the last step (\(\lim _{k\to \infty } A(k)/B(k)\) \(=\) \(\lim _{k\to \infty }\) \(A'(k)/B'(k)\) if \(\lim _{k\to \infty } A(k)\) \(=\) \(\lim _{k\to \infty } B(k) = \infty \)). (We replaced \(h\) with \(1/k\) as this allows us to re-express the quantities above in a familiar form, one where we can apply L’Hopital’s rule.) A similar analysis shows that the \(n\)th derivative vanishes at the origin for all \(n\), i.e., \(g^{(n)}(0) = 0\) for all positive integer \(n\). If we consider the Taylor series for \(g\) about 0, we find \[ g(x) \ = \ g(0) + g'(0)x + \frac {g''(0) x^2}{2!} + \cdots \ = \ \sum _{n=0}^\infty \frac {g^{(n)}(0) x^n}{n!} \ = \ 0; \] however, clearly \(g(x) \neq 0\) if \(x \neq 0\). We are thus in the ridiculous case where the Taylor series (which converges for all \(x\)!) only agrees with the function when \(x=0\). This isn’t that impressive, as the Taylor series is forced to agree with the original function at 0, as both are just \(g(0)\).

We can learn a lot from the above example. The first is that it’s possible for a Taylor series to converge for all \(x\), but only agree with the function at one point! It’s not too impressive to agree at just one point, as by construction the Taylor series has to agree at that point of expansion. The second, which is far more important, is that a Taylor series does not uniquely determine a function! For example, both \(\sin x\) and \(\sin x + g(x)\) (with \(g(x)\) the function from equation (D.1)) have the same Taylor series about \(x=0\).

The reason this is so important for us is that we want to understand when a moment generating function uniquely determines a probability distribution. If our distribution was discrete, there was no problem (Theorem 19.6.5). For continuous distributions, however, it’s much harder, as we saw in equation (19.6.5) where we met two densities that had the same moments.

Apparently, we must impose some additional conditions for continuous random variables. For discrete random variables, it was enough to know all the moments; this doesn’t suffice for continuous random variables. What should those conditions be?

Recall that if we have a random variable \(X\) with density \(f_X\), its \(k\)th moment, denoted by \(\mu _k'\), is defined by \[ \mu _k' \ = \ \int _{-\infty }^\infty x^k f_X(x) dx. \] Let’s consider again the pair of functions in equation (19.6.5). A nice calculus exercise shows that \(\mu _k' = e^{k^2/2}\). This means that the moment generating function is \[ M_X(t) \ = \ \sum _{k=0}^\infty \frac {\mu _k' t^k}{k!} \ = \ \sum _{k=0}^\infty \frac {e^{k^2/2} t^k}{k!}. \] For what \(t\) does this series converge? Amazingly, this series converges only when \(t=0\)! To see this, it suffices to show that the terms do not tend to zero. As \(k! \le k^k\), for any fixed \(t\), for \(k\) sufficiently large \(t^k/k! \ge (t/k)^k\); moreover, \(e^{k^2/2} = (e^{k/2})^k\), so the \(k\)th term is at least as large as \((e^{k/2} t / k)^k\). For any \(t \neq 0\), this clearly does not tend to zero, and thus the moment generating function has a radius of convergence of zero!

This leads us to the following conjecture: If the moment generating function converges for \({|t|} < \delta \) for some \(\delta > 0\), then it uniquely determines a density. We’ll explore this conjecture below.

1.2 Complex Analysis and Topology Definitions

Our purpose here is to give a flavor of what kind of inputs are needed to ensure that a moment generating function uniquely determines a probability density. We first collect some definitions, and then state some useful results from complex analysis.

Definition 1.2.1 (Complex variable, complex function) Any complex number \(z\) can be written as \(z = x + iy\), with \(x\) and \(y\) real and \(i = \sqrt {-1}\). We denote the set of all complex numbers by \(ℂ\). A complex function is a map \(f\) from \(ℂ\) to \(ℂ\); in other words \(f(z) \in ℂ\). Frequently one writes \(x = \Re (z)\) for the real part, \(y = \Im (z)\) for the imaginary part, and \(f(z) = u(x,y) + iv(x,y)\) with \(u\) and \(v\) functions from \(ℝ^2\) to \(ℝ\).

There are many ways to write complex numbers. The most common is the definition above; however, a polar coordinate approach is sometimes useful. One of the most remarkable relations in all of mathematics is \begin {equation*} e^{i\theta }\ = \ \cos \theta + i \sin \theta . \end {equation*} There are several ways to see this, depending on how much math you want to assume. One way is to use the Taylor series expansions for the exponential, sine and cosine functions. This gives another way of writing complex numbers; instead of \(1 + i\) we could write \(\sqrt {2} \exp (i\pi /4)\). A particularly interesting choice of \(\theta \) is \(\pi \), which gives \(e^{i\pi } = -1\), a beautiful formula involving many of the most important constants in mathematics!

Noting \(i^2=-1\), it isn’t too hard to show that

\begin {eqnarray*} (a+ib) + (x+iy) & \ = \ & (a+x) + i(b+y)\nonumber \\ (a+ib) \cdot (x+iy) &=& (ax-by) + i(ay+bx). \end {eqnarray*}

The complex conjugate of \(z=x+iy\) is \(\overline {z} := x - iy\), and we define the absolute value (or the modulus or magnitude) of \(z\) to be \(\sqrt {z\overline {z}}\), and denote this by \(|z|\). This is real valued, and equals \(\sqrt {x^2+y^2}\). If we were to write \(z\) as a vector, it would be \(z = (x,y)\); note that in this case we see that \(|z|\) equals the length of the corresponding vector.

We can write almost anything as an example of a complex function; one possible function is \(f(z) = z^2 + |z|\). The question is when is such a function differentiable in \(z\), and what does that differentiability entail. Actually, before we answer this we first need to state what it means for a complex function to be differentiable!

Definition 1.2.2 (Differentiable) We say a complex function \(f\) is (complex) differentiable at \(z_0\) if it’s differentiable with respect to the complex variable \(z\), which means \[\lim_{h \to 0} \frac {f(z_0+h) - f(z_0)}{h} \] exists, where \(h\) tends to zero along any path in the complex plane. If the limit exists we write \(f'(z_0)\) for the limit. If \(f\) is differentiable, then \(f(x+iy) = u(x,y)+iv(x,y)\) satisfies the Cauchy-Riemann equations: \[ f'(z) \ = \ \frac {\partial u}{\partial x} + i \frac {\partial v}{\partial x} \ = \ -i \frac {\partial u}{\partial y} + \frac {\partial v}{\partial y} \] (one direction is easy, arising from sending \(h\to 0\) along the paths \(\widetilde {h}\) and \(i\widetilde {h}\), with \(\widetilde {h} \in ℝ\)).


Here’s a quick hint to see why differentiability implies the Cauchy-Riemann equations – try and fill in the details. Since the derivative exists at \(z_0\), the key limit is independent of the path we take to the point \(x_0 + iy_0\). Consider the path \(x + iy_0\) with \(x\to x_0\), and the path \(x_0 + i y\) with \(y\to y_0\), and use results from multivariable calculus on partial derivatives.

Let’s explore a bit and see which functions are complex differentiable. We let \(h = h_1+ih_2\) below, with \(h\to 0 + 0i\). If \(f(z) = z\) then \begin {equation*} \lim_{h\to 0} \frac {f(z+h)-f(z)}{h} \ = \ \lim _{h\to 0} \frac {z+h-z}{h} \ = \ \lim _{h\to 0} 1 \ = \ 1; \end {equation*} thus the function is complex differentiable and the derivative is 1. If \(f(z) = z^2\) then

\begin {eqnarray*} \lim_{h\to 0} \frac {f(z+h) - f(z)}{h} & \ = \ & \lim _{h\to 0} \frac {(z+h)^2 - z^2}{h} \nonumber \\ &=& \lim _{h\to 0} \frac {z^2+2zh + h^2 - z^2}{h} \nonumber \\ &=& \lim _{h\to 0} \frac {2zh+h^2}{h} \nonumber \\ &=& \lim _{h\to 0} (2z+h) \nonumber \\ & = & \lim _{h\to 0} 2z + \lim _{h\to 0} h \nonumber \\ &=& 2z + 0 \ = \ 2z.\end {eqnarray*}

We’re using the following properties of complex numbers: \(h/h = 1\) and \(2zh+h^2 = (2z+h)h\). Note how similar this is to the real valued analogue, \(f(x) = x^2\). If \(f(z) = \overline {z}\) then \begin {equation*} \lim_{h\to 0} \frac {f(z+h)-f(z)}{h} \ = \ \lim _{h\to 0} \frac {\overline {z+h} - \overline {z}}{h}. \end {equation*} Unlike the other limits, this one isn’t immediately clear. Let’s write \(z = x+iy\), \(h = h_1 + ih_2\) (and of course \(\overline {z} = x-iy\), \(\overline {h} = h_1-ih_2\)). The limit is \begin {equation*} \lim_{h\to 0} \frac {x-iy + h-ih_2 - (x - iy)}{h_1+ih_2} \ = \ \lim _{h\to 0} \frac {h_1-ih_2}{h_1+ih_2}. \end {equation*} This limit does not exist; depending on how \(h\to 0\) we obtain different answers. For example, if \(h_2 = 0\) (traveling along the \(x\)-axis) the limit is just \(\lim _{h\to 0} h_1/h_1 = 1\), while if \(h_1 = 0\) (traveling along the \(y\)-axis) the limit is just \(\lim _{h\to 0} -ih_2/ih_2 = -1\). Thus this function isn’t complex differentiable anywhere, even though it’s a fairly straightforward function to define.

If we continue to argue along these lines, we find that a function is complex differentiable if the \(x\) and \(y\) dependence is in a very special form, namely everything is a function of \(z=x+iy\). In other words, we don’t allow our function to depend on \(\overline {z} = x - iy\). If we could depend on both, we could isolate out \(x\) (which is \(z+\overline {z}\)) and \(y\) (which is \((z-\overline {z})/i\)). We can begin to see why being complex differentiable once implies that we’re complex differentiable infinitely often, namely because of the very special dependence on \(x\) and \(y\). Also, in the plane there’s really only two ways to approach a point: from above, or from below. In the complex plane, the situation is strikingly different. There are so many ways we can move in two-dimensions, and each path must give the same answer if we’re to be complex differentiable. This is why differentiability means far more for a complex variable than for a real variable.

To state the needed results from Complex Analysis, we also require some terminology from Point Set Topology. In particular, many of the theorems below deal with open sets. We briefly review their definition and give some examples.

Definition 1.2.3 (Open set, closed set) A subset \(U\) of \(ℂ\) is an open set if for any \(z_0 \in U\) there’s a \(\delta \) such that whenever \({|z-z_0|} < \delta \) then \(z\in U\) (note \(\delta \) is allowed to depend on \(z_0\)). A set \(C\) is closed if its complement, \(ℂ\setminus C\), is open.

The following are examples of open sets in \(ℂ\).

1.
\(U_1 = \{z: |z| < r\}\) for any \(r > 0\). This is usually called the open ball of radius \(r\) centered at the origin.
2.
\(U_2 = \{z: \Re (z) > 0\}\). To see this is open, if \(z_0 \in U_2\) then we can write \(z_0 = x_0 + i y_0\), with \(x_0 > 0\). Letting \(\delta = x_0/2\), for \(z = x+iy\) we see that if \(|z-z_0| < \delta \) then \(|x-x_0| < x_0/2\), which implies \(x > x_0/2 > 0\); \(U_2\) is often called the open right half-plane.

For examples of closed sets, consider the following.

1.
\(C_1 = \{z: |z| \le r\}\). Note that if we take \(z_0\) to be any point on the boundary, then the ball of radius \(\delta \) centered at \(z_0\) will contain points more than \(r\) units from the origin, and thus \(C_1\) isn’t open. A little work shows, however, that \(C_1\) is closed (in fact, \(C_1\) is called the closed ball of radius \(r\) about the origin). We prove it’s closed by showing its complement is open. What we need to do is show that, given any point in the complement, there’s a small ball about that point entirely contained in the complement. I urge you to draw a picture for the following argument. If \(z_0 \in ℂ\setminus C_1\) then \(|z_0| > r\) (as otherwise it would be inside \(C_1\)). If we take \(\delta < \frac {|z_0| - r}2\) then after some algebra we’ll find that if \(|z-z_0| < \delta \) then \(z \in ℂ\setminus C_1\). Thus \(ℂ\setminus C_1\) is open, so \(C_1\) is closed.
2.
\(C_2 = \{z: \Re (z) \ge 0\}\). To see this set isn’t open, consider any \(z_0 = iy\) with \(y \in ℝ\). A similar calculation as the one we did for \(U_2\) or \(C_1\) shows \(C_2\) is closed.

For a set that is neither open nor closed, consider \(S = U_1 \cup C_2\).  

We now state two of the most important properties a complex function could have. One of the most important results in the subject is that these two seemingly very different properties are actually equivalent!

Definition 1.2.4 (Holomorphic, analytic) Let \(U\) be an open subset of \(ℂ\), and let \(f\) be a complex function. We say \(f\) is holomorphic on \(U\) if \(f\) is differentiable at every point \(z \in U\), and we say \(f\) is analytic on \(U\) if \(f\) has a series expansion that converges and agrees with \(f\) on \(U\). This means that for any \(z_0 \in U\), for \(z\) close to \(z_0\) we can choose \(a_n\)’s such that \[ f(z) \ = \ \sum _{n=0}^\infty a_n (z-z_0)^n. \]

As alluded to above, saying a function of a complex variable is differentiable turns out to imply far more than saying a function of a real variable is differentiable, as the following theorem shows us.

Theorem 1.2.5 Let \(f\) be a complex function and \(U\) an open set. Then \(f\) is holomorphic on \(U\) if and only if \(f\) is analytic on \(U\), and the series expansion for \(f\) is its Taylor series.

The above theorem is amazing; its result seems to good to be true. Namely, as soon as we know \(f\) is differentiable once, it’s infinitely (real) differentiable and \(f\) agrees with its Taylor series expansion! This is very different than what happens in the case of functions of a real variable. For instance, the function \begin {equation} h(x)\ =\ x^3 \sin (1/x) \tag{D.2} \end {equation} is differentiable once and only once at \(x=0\), and while the function \(g(x)\) from (D.1) is infinitely differentiable, the Taylor series expansion only agrees with \(g(x)\) at \(x=0\). Complex analysis is a very different subject than real analysis!

The next theorem provides a very nice condition for when a function is identically zero. It involves the notion of a limit or accumulation point, which we define first.

Definition 1.2.6 (Limit or accumulation point) We say \(z\) is a limit (or an accumulation) point of a sequence \(\{z_n\}_{n=0}^\infty \) if there exists a subsequence \(\{z_{n_k}\}_{k=0}^\infty \) converging to \(z\).

Let’s do some examples to clarify the definitions.

1.
If \(z_n = 1/n\), then \(0\) is a limit point.
2.
If \(z_n = \cos (\pi n)\) then there are two limit points, namely \(1\) and \(-1\). (If \(z_n = \cos (n)\) then every point in \([-1,1]\) is a limit point of the sequence, though this is harder to show.)
3.
If \(z_n = (1 + (-1)^n)^n + 1/n\), then \(0\) is a limit point. We can see this by taking the subsequence \(\{z_1,z_3,z_5,z_7,\dots \}\); note the subsequence \(\{z_0,z_2,z_4,\dots \}\) diverges to infinity.
4.
Let \(z_n\) denote the number of distinct prime factors of \(n\). Then every positive integer is a limit point! For example, let’s show \(5\) is a limit point. The first five primes are 2, 3, 5, 7 and 11; consider \(N = 2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 = 2310\). Consider the subsequence \(\{z_N, z_{N^2}, z_{N^3}, z_{N^4}, \dots \}\); as \(N^k\) has exactly 5 distinct prime factors for each \(k\), \(5\) is a limit point.
5.
If \(z_n = n^2\) then there are no limit points, as \(\lim _{n\to \infty } z_n = \infty \).
6.
Let \(z_0\) be any odd, positive integer, and set\[ z_{n+1} \ = \ \begin {cases} 3 z_n + 1 & \text {if $z_n$ is odd}\\ z_n/2 &\text {if $z_n$ is even.} \end {cases} \] It’s conjectured that 1 is always a limit point (and if some \(z_m = 1\), then the next few terms have to be \(4, 2, 1, 4, 2, 1, 4, 2, 1, \dots \), and hence the sequence cycles). This is the famous \(3x+1\) problem. Kakutani called it a conspiracy to slow down American mathematics because of the amount of time people spent on this; Erdös said mathematics isn’t yet ready for such problems. See [Lag1Lag2Lag3] for some nice expositions, but be warned that this problem can be addictive!

 

We can now state the theorem which, for us, is the most important result from Complex Analysis. It’s the basis of the black box results.

Theorem 1.2.7 Let \(f\) be an analytic function on an open set \(U\), with infinitely many zeros \(z_1, z_2, z_3, \dots \). If \(\lim _{n\to \infty } z_n \in U\), then \(f\) is identically zero on \(U\). In other words, if a function is zero along a sequence in \(U\) whose accumulation point is also in \(U\), then that function is identically zero in \(U\).

Note the above is very different than what happens in real analysis. Consider again the function from (D.2), \[ h(x) \ = \ x^3 \sin (1/x). \] This function is continuous and differentiable. It’s zero whenever \(x = 1/\pi n\) with \(n\) an integer. If we let \(z_n = 1/\pi n\), we see this sequence has \(0\) as a limit point, and our function is also zero at \(0\) (see Figure 1.1).

Error! Click to view log.
Figure 1.1: Plot of \(x^3 \sin (1/x)\).

It’s clear, however, that this function is not identically zero. Yet again, we see a stark difference between real and complex valued functions. As a nice exercise, show that \(x^3 \sin (1/x)\) is not complex differentiable. It will help if you recall \(e^{i\theta } = \cos \theta + i\sin \theta \), or \(\sin \theta = (e^{i\theta } - e^{-i\theta })/2\).

1.3 Complex analysis and moment generating functions

We conclude our technical digression by stating a few more very useful facts. The proof of these requires properties of the Laplace transform, which is defined by \((\mathcal {L}f)(s) = \int _0^\infty e^{-sx} f(x)dx\). The reason the Laplace transform plays such an important role in the theory is apparent when we recall the definition of the moment generating function of a random variable \(X\) with density \(f\): \[ M_X(t) = 𝔼 [e^{tX}] = \int _{-\infty }^\infty e^{tx} f(x)dx; \] in other words, the moment generating function is the Laplace transform of the density evaluated at \(s=-t\).

Remember that if \(F_X\) and \(G_Y\) are the cumulative distribution functions of the random variables \(X\) and \(Y\) with densities \(f\) and \(g\), then

\begin {eqnarray*} F_X(x) & \ = \ & \int _{-\infty }^x f(t) dt \nonumber \\ G_Y(y) &=& \int _{-\infty }^y g(v)dv. \end {eqnarray*}

We remind the reader of the two important results we assumed in the text (Theorems 20.5.3 and 20.5.4), which we restate below. After stating them we discuss their proofs.

Theorem 1.3.1 Assume the moment generating functions \(M_X(t)\) and \(M_Y(t)\) exist in a neighborhood of zero (i.e., there’s some \(\delta \) such that both functions exist for \({|t|} < \delta \)). If \(M_X(t) = M_Y(t)\) in this neighborhood, then \(F_X(u) = F_Y(u)\) for all \(u\). As the densities are the derivatives of the cumulative distribution functions, we have \(f=g\).

Theorem 1.3.2 Let \(\{X_i\}_{i \in I}\) be a sequence of random variables with moment generating functions \(M_{X_i}(t)\). Assume there’s a \(\delta > 0\) such that when \({|t|} < \delta \) we have \(\lim _{i\to \infty } M_{X_i}(t) = M_X(t)\) for some moment generating function \(M_X(t)\), and all moment generating functions converge for \({|t|} < \delta \). Then there exists a unique cumulative distribution function \(F\) whose moments are determined from \(M_X(t)\) and for all \(x\) where \(F_X(x)\) is continuous, \(\lim _{i\to \infty } F_{X_i}(x) = F_X(x)\).

The proof of these theorems follow from results in complex analysis, specifically the Laplace and Fourier inversion formulas. To give an example as to how the results from complex analysis allow us to prove results such as these, we give most of the details in the proof of the next theorem. We deliberately do not try and prove the following result in as great generality as possible!

Theorem 1.3.3 Let \(X\) and \(Y\) be two continuous random variables on \([0,\infty )\) with continuous densities \(f\) and \(g\), all of whose moments are finite and agree. Suppose further that:

1.
There is some \(C > 0\) such that for all \(c \le C\), \(e^{(c+1)t} f(e^t)\) and \(e^{(c+1)t} g(e^t)\) are Schwartz functions (see Definition 21.1.3). This isn’t a terribly restrictive assumption; \(f\) and \(g\) need to have decay in order for all moments to exist and be finite. As we’re evaluating \(f\) and \(g\) at \(e^t\) and not \(t\), there’s enormous decay here. The meat of the assumption is that \(f\) and \(g\) are infinitely differentiable and their derivatives decay.
2.
The (not necessarily integral) moments \[ \mu _{r_n}'(f) \ = \ \int _{0}^\infty x^{r_n} f(x)dx \ \ \ {\rm and} \ \ \ \mu _{r_n}'(g) \ = \ \int _0^\infty x^{r_n} g(x)dx \] agree for some sequence of non-negative real numbers \(\{r_n\}_{n=0}^\infty \) which has a finite accumulation point (i.e., \(\lim _{n\to \infty } r_n = r < \infty \)).

Then \(f=g\) (in other words, knowing all these moments uniquely determines the probability density).

Proof: We sketch the proof, which is long and sadly a bit technical. Remember the purpose of this proof is to highlight why our needed results from Complex Analysis are true. Feel free to skim or skip the proof, but we urge you to read the example at the end of this section, where we return to the two densities that are causing us so much heartache. Let \(h(x) = f(x) - g(x)\), and define \[ A(z)\ =\ \int _0^\infty x^z h(x)dx. \] Note that \(A(z)\) exists for all \(z\) with real part non-negative. To see this, let \(\Re (z)\) denote the real part of \(z\), and let \(k\) be the unique non-negative integer with \(k \le \Re (z) < k+1\). Then \(x^{{\Re z}} \le x^k + x^{k+1}\), and

\begin {eqnarray*} {|A(z)|} & \ \le \ & \int _0^\infty x^{{\Re (z)}} \left [{|f(x)|}+{|g(x)|}\right ]dx \\ & \ \le \ & \int _0^\infty (x^k + x^{k+1}) f(x)dx + \int _0^\infty (x^k+x^{k+1}) g(x)dx \ = \ 2\mu _k' + 2\mu _{k+1}'. \end {eqnarray*}

Results from analysis now imply that \(A(z)\) exists for all \(z\). The key point is that \(A\) is also differentiable. Interchanging the derivative and the integration (which can be justified; see Theorem ??), we find \[ A'(z) \ = \ \int _0^\infty x^z (\log x) h(x) dx. \] To show that \(A'(z)\) exists, we just need to show this integral is well-defined. There are only two potential problems with the integral, namely when \(x\to \infty \) and when \(x\to 0\). For \(x\) large, \(x^z \log x \le x^{\Re (z)+1}\) and thus the rapid decay of \(h\) gives \(\left |\int _1^\infty x^z (\log x) h(x)dx \right | < \infty \). For \(x\) near \(0\), \(h(x)\) looks like \(h(0)\) plus a small error (remember we’re assuming \(f\) and \(g\) are continuous); thus there’s a \(C\) so that \(|h(x)| \le C\) for \(|x| \le 1\). Note

\begin {eqnarray*} \lim_{\epsilon \to 0} \int _{\epsilon }^1 \left |\int _0^\infty x^z (\log x) h(x)dx \right | & \ \le \ & \lim _{\epsilon \to 0} 1 \int _{\epsilon }^1 1 \cdot (-\log x) \cdot C dx. \end {eqnarray*}

The anti-derivative of \(\log x\) is \(x\log x - x\), and \(\lim _{\epsilon \to 0} (\epsilon \log \epsilon - \epsilon ) = 0\). This is enough to prove that this integral is bounded, and thus from results in analysis we get \(A'(z)\) exists.

We (finally!) use our results from complex analysis. As \(A\) is differentiable once, it’s infinitely differentiable and it equals its Taylor series for \(z\) with \(\Re (z) > 0\). Therefore \(A\) is an analytic function which is zero for a sequence of \(z_n\)’s with an accumulation point, and thus it’s identically zero. This is spectacular – initially we only knew \(A(z)\) was zero if \(z\) was a positive integer or if \(z\) was in the sequence \(\{r_n\}\); we now know it’s zero for all \(z\) with \(\Re (z) > 0\). This remarkable conclusion comes from complex analysis; it’s here that we use it.

We change variables, and replace \(x\) with \(e^t\) and \(dx\) with \(e^tdt\). The range of integration is now \(-\infty \) to \(\infty \), and we set \(\mathfrak {h}(t)dt = h(e^t)e^tdt\). We now have \[ A(z) \ = \ \int _{-\infty }^\infty e^{tz} \mathfrak {h}(t)dt \ = \ 0. \] Choosing \(z = c + 2\pi i y\) with \(c\) less than the \(C\) from our hypotheses gives \[ A(c+2\pi i y) \ = \ \int _{-\infty }^\infty e^{2\pi i ty} \left [e^{ct} \mathfrak {h}(t)\right ]dt \ = \ 0. \] Our assumptions imply that \(e^{ct}\mathfrak {h}(t)\) is a Schwartz function, and thus it has a unique inverse Fourier transform. As we know this transform is zero, it implies that \(e^{ct} \mathfrak {h}(t) = 0\), or \(h(x) = 0\), or \(f(x) = g(x)\). \(\Box \)

We needed the analysis at the end on the inverse Fourier transform as our goal is to show that \(f(x) = g(x)\), not that \(A(z) = 0\). It seems absurd that \(A(z)\) could identically vanish without \(f=g\), but we must rigorously show this.

What if we lessen our restrictions on \(f\) and \(g\); perhaps one of them isn’t continuous?

Perhaps there’s a unique continuous probability distribution attached to a given sequence of moments such as in the above theorem, but if we allow non-continuous distributions there could be additional possibilities. This topic is beyond the scope of this book, requiring more advanced results from analysis; however, we wanted to point out where the dangers lie, where we need to be careful.

After proving Theorem 1.3.3, it’s natural to go back to the two densities that are causing so much trouble, namely (see (??))

\begin {eqnarray*} f_1(x) & \ = \ & \frac 1{\sqrt {2\pi x^2}}\ e^{-(\log ^2 x) / 2} \nonumber \\ f_2(x) & = & f_1(x) \left [1 + \sin (2\pi \log x)\right ]. \end {eqnarray*}

We know these two densities have the same integral moments (their \(k\)th moments are \(e^{k^2/2}\) for \(k\) a non-negative integer). These functions have the correct decay; note \[ e^{(c+1)t} f_1(e^t) \ = \ e^{(c+1)t} \cdot \frac {e^{-t^2/2}}{\sqrt {2\pi } e^{t}}, \] which decays fast enough for any \(c\) to satisfy the assumptions of Theorem 1.3.3. As these two densities are not the same, some condition must be violated. The only condition left to check is whether or not we have a sequence of numbers \(\{r_n\}_{n=0}^\infty \) with an accumulation point \(r>0\) such that the \(r_n\)th moments agree. Using more results from Complex Analysis (specifically, contour integration), we can calculate the \((a+ib)\)th moments. We find

\[(a+ib)^\text{th}\ {\rm moment\ of\ } f_1\ {\rm is}\ \ \ e^{(a+ib)^2/2}\]

and

\[(a+ib)^\text{th}\ {\rm moment\ of\ } f_1\ {\rm is} \ \ \ e^{(a+ib)^2/2} +\frac {i}{2} \left (e^{(a+i(b-2\pi ))^2/2}-e^{(a+i (b+2 \pi ))^2/2}\right ).\]

While these moments agree for \(b=0\) and \(a\) a positive integer, there’s no sequence of real moments having an accumulation point where they agree. To see this, note that when \(b=0\) the \(a\)th moment of \(f_2\) is \begin {equation*}e^{a^2/2} + e^{(a - 2 i \pi )^2/2} \left (1 - e^{4 i a \pi }\right ), \end {equation*} and this is never zero unless \(a\) is a half-integer (i.e., \(a = k/2\) for some integer \(k\)). In fact, the reason we wrote (??) as we did was to highlight the fact that it’s only zero when \(a\) is a half-integer. Exponentials of real or complex numbers are never zero, and thus the only way this can vanish is if \(1 = e^{4ia\pi }\). Recalling that \(e^{i\theta } = \cos \theta + i \sin \theta \), we see that the vanishing of the \(a\)th moment is equivalent to \(1 - \cos (4\pi a) - i \sin (4\pi a) = 0\); the only way this can happen is if \(a = k/2\) for some \(k\). If this happens, the cosine term is 1 and the sine term is 0.

1.4 Exercises

Problem 1.4.1 Let \(f(x) = x^3 \sin (1/x)\) for \(x \neq 0\) and set \(f(0) = 0\). (a) Show that \(f\) is differentiable once when viewed as a function of a real variable, but that it is not differentiable twice. (b) Show that \(f\) is not differentiable when viewed as a function of a complex variable \(z\); it might be useful to note that \(\sin u = (e^{iu} - e^{-iu})/2i\).

Problem 1.4.2 If we’re told that all the moments of \(f\) are finite and \(f\) is infinitely differentiable, must there be some \(C\) such that for all \(c < C\) we have \(e^{(c+1)t} f(e^t)\) is a Schwartz function?