Moment Generating Functions
Prerequisite:
The moment generating function (MGF) encodes the entire distribution of a random variable into a single analytic function. Its power lies in three facts: it recovers all moments via differentiation, it converts convolutions into products, and it uniquely identifies distributions. These properties together give an elegant proof of the Central Limit Theorem.
Definition and Moment Recovery
Definition. The moment generating function of a random variable $X$ is
$$M_X(t) = E!\left[e^{tX}\right], \quad t \in \mathbb{R},$$
whenever this expectation is finite in some open interval $(-h, h)$ around $t = 0$.
For a discrete $X$: $M_X(t) = \sum_x e^{tx} p(x)$. For a continuous $X$: $M_X(t) = \int_{-\infty}^\infty e^{tx} f(x),dx$.
Theorem (Moment Recovery). If $M_X(t)$ exists in a neighborhood of $0$, then for all $k \geq 1$,
$$E[X^k] = M_X^{(k)}(0),$$
where $M_X^{(k)}$ denotes the $k$-th derivative.
Proof. Expand the exponential as a power series and interchange expectation with differentiation (justified by dominated convergence under finite-MGF conditions):
$$M_X(t) = E!\left[\sum_{k=0}^\infty \frac{(tX)^k}{k!}\right] = \sum_{k=0}^\infty \frac{E[X^k]}{k!} t^k.$$
Differentiating $k$ times and evaluating at $t = 0$ extracts the coefficient $E[X^k]$. $\square$
In particular, $M_X'(0) = E[X]$ and $M_X''(0) = E[X^2]$, so $\text{Var}(X) = M_X''(0) - (M_X'(0))^2$.
MGF of Sums of Independent Variables
Theorem. If $X$ and $Y$ are independent, then
$$M_{X+Y}(t) = M_X(t) \cdot M_Y(t).$$
Proof. By independence, $e^{t(X+Y)} = e^{tX} e^{tY}$ and expectation factors:
$$M_{X+Y}(t) = E!\left[e^{t(X+Y)}\right] = E!\left[e^{tX}\right] E!\left[e^{tY}\right] = M_X(t) M_Y(t). \qquad \square$$
By induction, if $X_1, \ldots, X_n$ are independent, $M_{X_1 + \cdots + X_n}(t) = \prod_{i=1}^n M_{X_i}(t)$.
Uniqueness Theorem. If two random variables have the same MGF on an open interval containing 0, they have the same distribution.
This is the MGF’s most important theoretical property: it fully characterizes the distribution.
MGF Derivations for Common Distributions
Bernoulli$(p)$
$$M_X(t) = E[e^{tX}] = (1-p)e^{0} + p e^{t} = 1 - p + pe^t.$$
Binomial$(n, p)$
Writing $X = \sum_{i=1}^n X_i$ with $X_i \sim \text{Bernoulli}(p)$ i.i.d.:
$$M_X(t) = \prod_{i=1}^n M_{X_i}(t) = (1 - p + pe^t)^n.$$
Poisson$(\lambda)$
$$M_X(t) = \sum_{k=0}^\infty e^{tk} \frac{e^{-\lambda}\lambda^k}{k!} = e^{-\lambda} \sum_{k=0}^\infty \frac{(\lambda e^t)^k}{k!} = e^{-\lambda} e^{\lambda e^t} = e^{\lambda(e^t - 1)}.$$
Variance of Poisson via MGF. We have $M_X'(t) = \lambda e^t \cdot e^{\lambda(e^t-1)}$, so $M_X'(0) = \lambda$. For the second moment: $M_X''(t) = (\lambda e^t + \lambda^2 e^{2t})e^{\lambda(e^t - 1)}$, hence $M_X''(0) = \lambda + \lambda^2$. Therefore $\text{Var}(X) = (\lambda + \lambda^2) - \lambda^2 = \lambda$, confirming the well-known result that the Poisson mean equals its variance.
Normal$(\mu, \sigma^2)$
Theorem. If $X \sim \mathcal{N}(\mu, \sigma^2)$, then $M_X(t) = \exp!\left(\mu t + \tfrac{1}{2}\sigma^2 t^2\right)$.
Proof. Complete the square in the exponent:
$$M_X(t) = \frac{1}{\sigma\sqrt{2\pi}}\int_{-\infty}^\infty e^{tx} \exp!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)dx.$$
Write $tx - \frac{(x-\mu)^2}{2\sigma^2} = -\frac{(x - (\mu + \sigma^2 t))^2}{2\sigma^2} + \mu t + \frac{\sigma^2 t^2}{2}$. The remaining integral is $\int_{-\infty}^\infty \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-(\mu+\sigma^2 t))^2/(2\sigma^2)}dx = 1$, giving $M_X(t) = e^{\mu t + \sigma^2 t^2/2}$. $\square$
Cumulants and the Cumulant Generating Function
Definition. The cumulant generating function (CGF) is $K_X(t) = \log M_X(t)$.
Expanding $K_X(t) = \sum_{k=1}^\infty \kappa_k t^k / k!$, the coefficients $\kappa_k = K_X^{(k)}(0)$ are the cumulants.
Key cumulants:
- $\kappa_1 = E $ (the mean)
- $\kappa_2 = \text{Var}(X)$ (the variance)
- $\kappa_3 = E[(X-\mu)^3]$ (the skewness numerator)
For the Normal$(\mu, \sigma^2)$: $K_X(t) = \mu t + \frac{1}{2}\sigma^2 t^2$, so all cumulants of order $\geq 3$ vanish. This characterizes the Normal distribution.
For the Poisson$(\lambda)$: $K_X(t) = \lambda(e^t - 1)$, so $K_X^{(k)}(0) = \lambda$ for all $k \geq 1$ - all cumulants equal $\lambda$.
Characteristic Functions
Not every distribution admits a finite MGF (e.g., heavy-tailed distributions). The characteristic function avoids this issue.
Definition. $\varphi_X(t) = E[e^{itX}]$ for $t \in \mathbb{R}$, where $i = \sqrt{-1}$.
Since $|e^{itX}| = 1$, the expectation is always finite. Characteristic functions are Fourier transforms of the density (or PMF). The uniqueness theorem holds, and $\varphi_{X+Y}(t) = \varphi_X(t)\varphi_Y(t)$ for independent $X, Y$.
Sketch Proof of the CLT via Characteristic Functions
Let $X_1, X_2, \ldots$ be i.i.d. with mean $\mu$ and variance $\sigma^2$. Set $S_n = (X_1 + \cdots + X_n - n\mu)/(\sigma\sqrt{n})$. We want $\varphi_{S_n}(t) \to e^{-t^2/2}$ (the standard Normal’s characteristic function).
The characteristic function of a single centered, scaled term $Y_i = (X_i - \mu)/(\sigma\sqrt{n})$ is
$$\varphi_{Y_i}(t) = 1 - \frac{t^2}{2n} + O(n^{-3/2}),$$
using the Taylor expansion $e^{iu} \approx 1 + iu - u^2/2 + \ldots$ and $E[Y_i] = 0$, $E[Y_i^2] = 1/n$.
Since $S_n = \sum_{i=1}^n Y_i$ with independent terms,
$$\varphi_{S_n}(t) = \left(1 - \frac{t^2}{2n} + O(n^{-3/2})\right)^n \xrightarrow{n\to\infty} e^{-t^2/2}.$$
By the continuity theorem for characteristic functions, this convergence implies $S_n \xrightarrow{d} \mathcal{N}(0,1)$.
Examples
Finding the Variance of a Poisson Sum
Suppose $X_1, \ldots, X_n \sim \text{Poisson}(\lambda)$ are i.i.d. and $S = X_1 + \cdots + X_n$. The MGF of $S$ is
$$M_S(t) = \prod_{i=1}^n e^{\lambda(e^t - 1)} = e^{n\lambda(e^t - 1)},$$
so $S \sim \text{Poisson}(n\lambda)$ and $\text{Var}(S) = n\lambda$. This also follows from linearity of variance for independent variables, but the MGF approach simultaneously identifies the full distribution.
MGF of a Sample Mean
For $\bar{X} = S/n$, $M_{\bar{X}}(t) = M_S(t/n) = e^{n\lambda(e^{t/n}-1)}$. As $n \to \infty$, using $e^{t/n} \approx 1 + t/n + t^2/(2n^2)$, we get $M_{\bar{X}}(t) \approx e^{\lambda t + \lambda t^2/(2n)} \to e^{\lambda t}$, the MGF of the constant $\lambda$. This is the Law of Large Numbers from the MGF perspective: $\bar{X} \to \lambda$ in distribution (and indeed almost surely).
Read Next: