Martingales & Optional Stopping // Megha Bose

Prerequisite:

Probability

Filtrations and Adapted Processes

To reason about processes that evolve over time, we need a precise notion of “information available up to time $n$.”

Definition. A filtration on a probability space $(\Omega, \mathcal{F}, P)$ is an increasing sequence of $\sigma$-algebras $\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \mathcal{F}_2 \subseteq \cdots \subseteq \mathcal{F}$. Intuitively, $\mathcal{F}_n$ encodes everything observable by time $n$.

Definition. A sequence of random variables $(X_n)_{n \geq 0}$ is adapted to $(\mathcal{F}_n)$ if $X_n$ is $\mathcal{F}_n$-measurable for each $n$ - that is, knowing $\mathcal{F}_n$ determines $X_n$.

The natural filtration generated by $(X_n)$ is $\mathcal{F}_n = \sigma(X_0, X_1, \ldots, X_n)$, the smallest $\sigma$-algebra making $X_0, \ldots, X_n$ measurable.

Martingales

Definition. An adapted integrable sequence $(X_n)$ is a martingale with respect to $(\mathcal{F}_n)$ if

$$E[X_{n+1} \mid \mathcal{F}_n] = X_n \quad \text{for all } n \geq 0.$$

It is a sub-martingale if $E[X_{n+1} \mid \mathcal{F}n] \geq X_n$, and a super-martingale if $E[X{n+1} \mid \mathcal{F}_n] \leq X_n$.

The martingale condition says that the best prediction of $X_{n+1}$ given all past information is simply the current value. This models a fair game: knowing the history gives no edge.

Example 1: Symmetric random walk. Let $\xi_1, \xi_2, \ldots$ be i.i.d. with $P(\xi_i = +1) = P(\xi_i = -1) = 1/2$ and set $S_n = \sum_{i=1}^n \xi_i$, $S_0 = 0$. Then $E[S_{n+1} | \mathcal{F}n] = S_n + E[\xi{n+1}] = S_n$, so $(S_n)$ is a martingale.

Example 2: Zero-mean i.i.d. sums. More generally, if $X_i$ are i.i.d. with $E[X_i] = 0$, then $S_n = \sum_{i=1}^n X_i$ is a martingale.

Example 3: Likelihood ratios. If $P$ and $Q$ are probability measures with $Q \ll P$ and $p_n(x)$, $q_n(x)$ are their densities for the first $n$ observations, then $M_n = q_n(X_1,\ldots,X_n)/p_n(X_1,\ldots,X_n)$ is a martingale under $P$. This underpins sequential hypothesis testing.

Doob’s Optional Stopping Theorem

A stopping time $\tau$ is a random variable taking values in ${0, 1, 2, \ldots} \cup {\infty}$ such that ${\tau \leq n} \in \mathcal{F}_n$ for all $n$. Intuitively, the decision to stop by time $n$ depends only on information available by time $n$.

Theorem (Doob’s Optional Stopping). Let $(X_n)$ be a martingale and $\tau$ a stopping time. Then $E[X_\tau] = E[X_0]$ provided any one of the following conditions holds:

$\tau$ is bounded: $\tau \leq N$ for some fixed $N < \infty$.
The increments are bounded ($|X_{n+1} - X_n| \leq C$) and $E[\tau] < \infty$.
$E[\tau] < \infty$ and $E[|X_{n+1} - X_n| \mid \mathcal{F}_n] \leq C$ for some constant $C$.

Proof sketch for bounded $\tau$. Since $\tau \leq N$, we can write $X_\tau = X_0 + \sum_{k=0}^{N-1} (X_{k+1} - X_k) \mathbf{1}_{\tau > k}$. Taking expectations and using the martingale property with the fact that ${\tau > k} \in \mathcal{F}_k$:

$$E[X_\tau] = E[X_0] + \sum_{k=0}^{N-1} E\bigl[(X_{k+1} - X_k)\mathbf{1}_{\tau > k}\bigr].$$

Now $E[(X_{k+1} - X_k)\mathbf{1}{\tau > k}] = E\bigl[\mathbf{1}{\tau > k} E[X_{k+1} - X_k \mid \mathcal{F}k]\bigr] = 0$ by the martingale property. Hence $E[X\tau] = E[X_0]$. $\square$

The conditions matter: without them the theorem fails. For example, let $S_n$ be a simple random walk and $\tau = \inf{n : S_n = 1}$ (which has $P(\tau < \infty) = 1$ but $E[\tau] = \infty$). Then $E[S_\tau] = 1 \neq 0 = E[S_0]$.

Gambler’s Ruin via Martingales

A gambler starts with $k$ dollars and plays a fair coin-flip game, winning or losing $1 each round. The game ends when the gambler reaches 0 (ruin) or $N$ (goal). Let $\tau$ be this stopping time and $p$ the probability of reaching $N$.

Since $(S_n)$ is a martingale and $\tau$ is bounded by $kN$ (by a classical argument), the optional stopping theorem gives:

$$E[S_\tau] = E[S_0] = k.$$

But $S_\tau = N$ with probability $p$ and $S_\tau = 0$ with probability $1 - p$, so $Np = k$, giving

$$p = \frac{k}{N}.$$

For the biased case with $P(\text{win}) = q \neq 1/2$, the process $(r^{S_n})$ where $r = (1-q)/q$ is a martingale. Optional stopping then yields the ruin probability $p = \frac{r^k - r^N}{1 - r^N}$ (for $q \neq 1/2$).

Doob’s Maximal Inequality

Theorem. Let $(X_n)_{0 \leq n \leq N}$ be a non-negative sub-martingale. Then for any $\lambda > 0$,

$$P!\left(\max_{0 \leq k \leq N} X_k \geq \lambda\right) \leq \frac{E[X_N]}{\lambda}.$$

Proof sketch. Let $\tau = \inf{k : X_k \geq \lambda} \wedge N$. By optional stopping for sub-martingales, $E[X_N] \geq E[X_\tau] \geq \lambda \cdot P(\tau < N) = \lambda \cdot P(\max_k X_k \geq \lambda)$. $\square$

For a martingale $(X_n)$, apply this to the sub-martingale $(X_n^+)$, obtaining $P(\max_{k \leq N} X_k \geq \lambda) \leq E[X_N^+]/\lambda$.

Martingale Convergence Theorem

Theorem. Let $(X_n)$ be a martingale with $\sup_n E[|X_n|] < \infty$. Then $X_n \to X_\infty$ almost surely for some integrable $X_\infty$.

The proof uses the upcrossing inequality: the number of times the sequence crosses from below $a$ to above $b$ ($a < b$) is bounded in expectation by $E[(X_n - a)^+]/(b-a)$. A sequence that does not converge would have infinitely many upcrossings, contradicting this bound.

Examples

Optional stopping in sequential tests. The sequential probability ratio test (SPRT) for testing $H_0: \theta = \theta_0$ vs $H_1: \theta = \theta_1$ uses the log-likelihood ratio $\Lambda_n = \sum_{i=1}^n \log\frac{p(X_i|\theta_1)}{p(X_i|\theta_0)}$. The stopping rule is $\tau = \inf{n : \Lambda_n \notin (a, b)}$ for thresholds $a < 0 < b$. Under $H_0$, $M_n = e^{\Lambda_n}$ is a martingale, and optional stopping provides exact error bounds: $P_{H_0}(\Lambda_\tau \geq b) \leq e^{-b}$.

Online learning regret bounds. In the weighted majority algorithm for online prediction, define expert weights $w_i^{(t)}$ and the potential $\Phi_t = \sum_i w_i^{(t)}$. With the correct update rule, $(\Phi_t)$ is a super-martingale under adversarial losses, which implies $\Phi_t$ decreases at a controlled rate. Comparing $\Phi_T$ with the best expert’s weight then gives a regret bound of $O(\sqrt{T \log N})$ for $N$ experts over $T$ rounds - a fundamental result in online learning derived directly from the optional stopping and convergence machinery of martingales.

Read Next:

Causal Inference