Helpful context:


A gambler walks into a casino with a strategy they are convinced is foolproof. The strategy works like this: bet 1 dollar. If you lose, bet 2 dollars next time. If you lose again, bet 4 dollars. Keep doubling until you win. When you finally win, you recover all your previous losses plus 1 dollar in profit. Then start over.

This is called the “martingale strategy,” and the logic seems airtight. Every time you eventually win a single bet, you net 1 dollar. You will eventually win - surely? - so you will eventually profit.

There are casinos that have been operating profitably for centuries. What is wrong with the strategy?

The answer involves one of the most beautiful and practically useful ideas in probability theory: the mathematical notion of a martingale, and a theorem that tells you exactly when “eventually winning” guarantees you nothing.


A Fair Game

Before the definition, think about what “fair game” means mathematically.

Suppose you play a sequence of coin flips, winning 1 dollar on heads and losing 1 dollar on tails. At each point in time, if you know everything that has happened so far, your expected future fortune is exactly your current fortune. Not higher (the casino does not spot you money), not lower (the game is fair). Your expected fortune tomorrow, given today’s fortune, equals today’s fortune.

This is the essence of a martingale: a sequence of random variables where your best prediction of the future is the present. No information from the past can improve on this prediction, because the process has no drift - no trend up or down.


The Definition

Let $X_0, X_1, X_2, \ldots$ be a sequence of random variables. We need a way to say “given everything up to time $n$.” In discrete time, this is simply conditioning on $X_0, X_1, \ldots, X_n$.

Definition. The sequence $(X_n)_{n \geq 0}$ is a martingale if:

  1. $E[|X_n|] < \infty$ for all $n$ (the values are integrable), and
  2. $E[X_{n+1} \mid X_0, X_1, \ldots, X_n] = X_n$ for all $n$.

Condition 2 is the key one. It says: given the entire history up to time $n$, the best forecast of $X_{n+1}$ is simply the current value $X_n$. The sequence has no predictable drift.

Two related concepts:

  • A supermartingale satisfies $E[X_{n+1} \mid X_0, \ldots, X_n] \leq X_n$: expected to decrease or stay flat. A game that favors the house is a supermartingale for the gambler’s wealth.
  • A submartingale satisfies $E[X_{n+1} \mid X_0, \ldots, X_n] \geq X_n$: expected to increase or stay flat. A game that favors the gambler (or any system with drift upward) is a submartingale.

Three Canonical Examples

Simple random walk. Let $\xi_1, \xi_2, \ldots$ be independent, each $+1$ or $-1$ with equal probability. Define $S_n = \xi_1 + \xi_2 + \cdots + \xi_n$ (the total displacement after $n$ steps), with $S_0 = 0$. Then:

$$E[S_{n+1} \mid S_0, \ldots, S_n] = S_n + E[\xi_{n+1}] = S_n + 0 = S_n.$$

The random walk is a martingale. This models a fair coin-flip game: your running total has no expected drift in either direction.

Gambler’s fortune. A gambler starts with $x$ dollars and plays a fair game, winning or losing 1 dollar per round. Their fortune $X_n$ at time $n$ is the simple random walk started at $x$. It is a martingale.

Doob’s martingale. Let $Y$ be any integrable random variable, and let $X_1, X_2, \ldots$ be any sequence of random variables. Define:

$$M_n = E[Y \mid X_1, X_2, \ldots, X_n].$$

Then $(M_n)$ is a martingale. This is called Doob’s martingale, and it is remarkably general. It says: if you are trying to predict some fixed quantity $Y$, and you are accumulating information over time, your sequence of best predictions is always a martingale. The predictions update as you learn more, but on average they neither systematically overshoot nor undershoot - they are honest forecasts.

This construction is fundamental in Bayesian statistics: your posterior mean at time $n$, after observing data $X_1, \ldots, X_n$, is a martingale in $n$.


Stopping Times

To talk about the doubling strategy - or any strategy involving a decision about when to quit - we need to formalize the notion of a stopping rule.

A stopping time $\tau$ is a random time to stop, with the property that the decision to stop at time $n$ depends only on what has happened up to time $n$, not on future values. Formally: the event $\{\tau = n\}$ is determined by $X_0, X_1, \ldots, X_n$.

Examples:

  • “Stop at time 100” - a deterministic stopping time.
  • “Stop the first time $X_n > 10$” - a valid stopping time: you can tell it has happened by observing the sequence.
  • “Stop one step before the maximum” - not a valid stopping time: you would need to know the future.

The doubling strategy defines a stopping time $\tau = $ “first time you win a bet.” You can tell when this has happened: you just observe that you won. So $\tau$ is a valid stopping time.

Now the key question: if $(X_n)$ is a martingale and $\tau$ is a stopping time, is $E[X_\tau] = E[X_0]$?


The Optional Stopping Theorem

Theorem (Optional Stopping). Let $(X_n)$ be a martingale and $\tau$ a stopping time. If any of the following conditions hold, then $E[X_\tau] = E[X_0]$:

  1. $\tau$ is bounded: $\tau \leq N$ for some fixed $N$.
  2. $E[\tau] < \infty$ and the increments $|X_{n+1} - X_n|$ are bounded by a constant.
  3. $E[\tau] < \infty$ and $E[|X_\tau|] < \infty$ and $E[|X_{n+1} - X_n| \mid X_0, \ldots, X_n]$ is bounded.

The conclusion $E[X_\tau] = E[X_0]$ says: no stopping strategy can change the expected value of a martingale. However clever your rule for when to quit, if the game is fair, your expected fortune when you stop equals your fortune when you started.

This is the mathematical statement that there is no free lunch in a fair game.


The Doubling Strategy, Dissected

The doubling strategy defines a martingale (the gambler’s net position) and a stopping time (stop at the first win). Why does the gambler not profit?

Check the OST conditions. With probability 1, the gambler eventually wins, so $\tau < \infty$ almost surely. But:

$$E[\tau] = \sum_{k=1}^\infty k \cdot (1/2)^k = 2 < \infty.$$

Actually $E[\tau]$ is finite. The issue is the second condition: the increments are not bounded. On the $k$-th consecutive loss, the bet size is $2^{k-1}$ dollars. If the gambler loses $k$ times in a row before winning, the total loss before the win is $1 + 2 + 4 + \cdots + 2^{k-1} = 2^k - 1$ dollars, recovered by a win of $2^{k-1}$ - netting 1 dollar. But to execute this strategy, the gambler needs $2^k - 1$ dollars in reserve to absorb $k$ consecutive losses.

As $k \to \infty$, this reserve requirement grows without bound. The strategy requires infinite capital to be guaranteed to succeed. In practice, casinos have table limits (no bet above a certain amount), and gamblers have finite bankrolls. Once either limit is hit, the strategy breaks down.

The OST fails here because condition 2 is violated: the increments $|X_{n+1} - X_n|$ are the bet sizes, which grow unboundedly. Without bounded increments, the optional stopping theorem does not apply, and you cannot conclude $E[X_\tau] = X_0$.

In fact, for a gambler with finite capital $C$, the doubling strategy fails: with positive probability they exhaust their bankroll before winning, and the expected gain is zero - as the OST would predict once we properly account for the bankruptcy constraint.


Applications

Hitting probabilities for random walks. Let $(S_n)$ be a simple random walk started at $S_0 = 0$, and let $\tau_{a,b}$ be the first time $S_n$ hits $+a$ or $-b$ (where $a, b > 0$ are positive integers). Since $\tau_{a,b}$ is bounded-increment and one can show $E[\tau_{a,b}] < \infty$, the OST applies:

$$E[S_{\tau_{a,b}}] = E[S_0] = 0.$$

Let $p$ be the probability of hitting $+a$ before $-b$. Then:

$$E[S_{\tau_{a,b}}] = p \cdot a + (1-p) \cdot (-b) = 0 \implies p = \frac{b}{a+b}.$$

The probability of reaching $+a$ before $-b$ is exactly $b/(a+b)$. Starting closer to $+a$ (i.e., small $a$) makes it more likely to hit $+a$ first. This is a clean, formula-free proof that uses only the martingale property.

Wald’s identity. For a random walk $S_n = \xi_1 + \cdots + \xi_n$ with i.i.d. steps $\xi_i$ satisfying $E[\xi_i] = \mu$ and a stopping time $\tau$ with $E[\tau] < \infty$:

$$E[S_\tau] = \mu \cdot E[\tau].$$

The expected total displacement equals the expected step size times the expected number of steps. This looks obvious, but the content is that it holds even when $\tau$ is random and correlated with the steps - because the martingale structure prevents any systematic exploitation. Wald’s identity is used in sequential analysis (designing tests that stop as soon as enough evidence has accumulated) and in queuing theory.

Algorithm analysis. Many randomized algorithms' running times are analyzed via martingale arguments. A classic example: in the analysis of Quicksort, one shows that a certain potential function related to the number of inversions is a supermartingale, allowing tight bounds on expected running time. Randomized data structures, hashing, and online algorithms frequently use variants of the optional stopping theorem to bound expected costs.


Martingales in Machine Learning

Martingale arguments appear throughout the theoretical underpinnings of modern machine learning.

In online learning, the regret of an algorithm (the difference between its cumulative loss and the best fixed strategy in hindsight) is often analyzed using martingale inequalities. Azuma’s inequality, a concentration bound for martingales with bounded differences, is the workhorse: it says that if each step of a martingale changes by at most $c_k$, then

$$P(|X_n - X_0| \geq t) \leq 2\exp\left(\frac{-2t^2}{\sum_{k=1}^n c_k^2}\right).$$

This is the analogue of the Chernoff bound for dependent random variables that form a martingale sequence.

In multi-armed bandit problems, the cumulative reward of an algorithm is often a submartingale (expected to increase), and regret bounds are derived from this structure.

In stochastic optimization (SGD and its variants), the iterates do not form a martingale, but the noise in the gradient estimate does - and concentration bounds for this noise martingale underpin convergence analyses.


Discomfort check. The formal framework for martingales involves a filtration: a sequence $\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \mathcal{F}_2 \subseteq \cdots$ of sigma-algebras representing “all information available up to time $n$.” In discrete time with a finite number of random variables, $\mathcal{F}n$ is simply the sigma-algebra generated by $X_0, X_1, \ldots, X_n$ - the collection of all events you can determine from those variables. The condition $E[X{n+1} \mid \mathcal{F}n] = X_n$ is just the formal version of “given everything up to time $n$, the best prediction of $X{n+1}$ is $X_n$.” In continuous time (Brownian motion, stochastic differential equations), the filtration machinery becomes essential and non-trivial, which is one of the reasons measure-theoretic probability is foundational to modern stochastic processes. If you encounter terms like “adapted process” or “predictable process,” they are making precise which random variables are determined by which sigma-algebras.


Summary

Concept Meaning
Martingale $E[X_{n+1} \mid X_0, \ldots, X_n] = X_n$ - no predictable drift
Supermartingale Expected to decrease (unfavorable game)
Submartingale Expected to increase (favorable game)
Stopping time A random time determined only by past information
Optional Stopping Theorem $E[X_\tau] = E[X_0]$ under regularity conditions
Doubling strategy failure OST conditions fail: unbounded increments need infinite capital
Hitting probability $P(\text{hit } {+a} \text{ before } {-b}) = b/(a+b)$ for simple RW
Azuma’s inequality Concentration bound for bounded-increment martingales

A martingale is the mathematical formalization of a fair game. The Optional Stopping Theorem says that no clever stopping strategy can turn a fair game into a profitable one - unless it violates regularity conditions, which in practice means requiring infinite capital, unbounded time, or both. The doubling strategy is a vivid illustration: the guarantee of eventual profit is real, but it requires resources that no finite player possesses.


Read next: