Catalan Numbers - The Surprisingly Ubiquitous Count of Valid Arrangements
Helpful context:
- Counting & Choice - How to Count Without Counting Everything
- Recurrence Relations - Equations That Define Themselves
Suppose you want to fully parenthesize a product of 4 factors, say $abcd$. You have to insert enough parentheses to make the order of multiplication completely explicit. There turn out to be exactly 5 ways:
$$((ab)c)d, \quad (a(bc))d, \quad (ab)(cd), \quad a((bc)d), \quad a(b(cd)).$$
Now try 5 factors. The count jumps to 14. Six factors: 42. The sequence 1, 1, 2, 5, 14, 42, 132, 429 is the sequence of Catalan numbers, named after the Belgian mathematician Eugène Catalan (1814-1894). These numbers count parenthesizations, but they also count - and this is the surprise - a large family of completely different-looking combinatorial objects: triangulations of polygons, paths that stay below a diagonal, full binary trees, valid bracket sequences. All of these are in bijection with one another, and all are counted by the same formula.
The Catalan numbers are one of the most pervasive sequences in combinatorics. Stanley’s book on enumerative combinatorics lists over 200 distinct combinatorial interpretations of $C_n$. The power of the theory is not just that the formula works, but that the bijections between interpretations are themselves mathematically illuminating. Understanding why parenthesizations and lattice paths give the same count is more valuable than knowing the formula.
Five Interpretations
Let $C_n$ denote the $n$-th Catalan number ($C_0 = 1$). The following sets all have the same size $C_n$:
-
Parenthesizations: the number of ways to fully parenthesize a product of $n+1$ factors.
-
Full binary trees: the number of full binary trees (every internal node has exactly 2 children) with $n+1$ leaves (equivalently, $n$ internal nodes).
-
Monotone lattice paths: the number of monotone paths from $(0,0)$ to $(n,n)$ using unit steps right (R) and up (U) that never go strictly above the diagonal $y = x$.
-
Balanced parentheses: the number of strings of $n$ opening and $n$ closing parentheses that are valid (every prefix has at least as many opens as closes).
-
Polygon triangulations: the number of ways to triangulate a convex $(n+2)$-gon by drawing non-crossing diagonals.
For example, $C_3 = 5$:
- 5 parenthesizations of $abcd$
- 5 full binary trees with 4 leaves
- 5 paths from $(0,0)$ to $(3,3)$ not going above $y = x$
- 5 valid strings of 3 open and 3 close brackets
- 5 triangulations of a convex pentagon
The bijections between these interpretations are instructive. A full binary tree with $n+1$ leaves corresponds to a parenthesization of $n+1$ factors (each internal node represents a multiplication). A valid bracket sequence corresponds to a lattice path (open bracket = step right, close bracket = step up). These bijections make precise the sense in which all five interpretations are “the same” combinatorial structure.
The Recurrence
The Catalan numbers satisfy the recurrence
$$C_0 = 1, \qquad C_n = \sum_{i=0}^{n-1} C_i \cdot C_{n-1-i} \quad \text{for } n \geq 1.$$
Derivation (binary trees). Every full binary tree with $n+1$ leaves has a root whose left subtree has $i+1$ leaves and right subtree has $n-i$ leaves, for some $0 \leq i \leq n-1$. The number of left subtrees is $C_i$ and the number of right subtrees is $C_{n-1-i}$. Summing over all splits:
$$C_n = \sum_{i=0}^{n-1} C_i \cdot C_{n-1-i}.$$
This convolution structure is what makes generating functions the natural tool for computing $C_n$.
The first several values, computed from the recurrence:
| $n$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| $C_n$ | 1 | 1 | 2 | 5 | 14 | 42 | 132 | 429 |
The Closed Form via Lattice Paths
The lattice path interpretation gives the cleanest derivation of the closed form.
Theorem. $C_n = \dfrac{1}{n+1}\dbinom{2n}{n}.$
Proof (ballot problem / reflection argument).
The total number of monotone paths from $(0,0)$ to $(n,n)$ is $\binom{2n}{n}$ - we choose which $n$ of the $2n$ steps are rightward.
We want to count paths that never go strictly above the diagonal $y = x$ (i.e., the number of up steps never exceeds the number of right steps at any point). Call such paths good. The bad paths are those that, at some point, have strictly more up steps than right steps.
We use the reflection argument of Désiré André (1887). Given any bad path, find the first step where the path crosses above $y = x$ (where up steps first exceed right steps). Reflect the portion of the path before that step across the line $y = x + 1$. This maps the initial segment (up to the first violation) to a path starting at $(0,0)$ reflected to $(-1, 1)$. After reflection, a step R becomes a step U and vice versa.
More carefully: a bad path from $(0,0)$ to $(n,n)$ must touch the line $y = x + 1$ at some point. Let $P$ be the first such point. Reflect the portion of the path from $(0,0)$ to $P$ across the line $y = x + 1$. This reflected prefix starts at the reflection of $(0,0)$ across $y = x+1$, which is $(-1, 1)$. The full reflected path ends at $(n,n)$ and started at $(-1, 1)$.
This reflection gives a bijection between bad paths from $(0,0)$ to $(n,n)$ and all paths from $(-1, 1)$ to $(n,n)$. The number of such paths is $\binom{2n}{n-1}$: we need $n+1$ right steps and $n-1$ up steps, total $2n$ steps, choosing $n-1$ of them to go up.
Therefore the number of good paths is
$$C_n = \binom{2n}{n} - \binom{2n}{n-1} = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n} = \frac{1}{n+1}\binom{2n}{n}. \quad \square$$
The equality $\binom{2n}{n} - \binom{2n}{n-1} = \frac{1}{n+1}\binom{2n}{n}$ can be verified directly:
$$\binom{2n}{n-1} = \frac{(2n)!}{(n+1)!(n-1)!} = \frac{n}{n+1} \cdot \frac{(2n)!}{n! \cdot n!} = \frac{n}{n+1}\binom{2n}{n}.$$
So $C_n = \binom{2n}{n} - \frac{n}{n+1}\binom{2n}{n} = \frac{1}{n+1}\binom{2n}{n}$. This can also be written
$$C_n = \frac{(2n)!}{(n+1)! n!}.$$
The Generating Function
Let $C(x) = \sum_{n=0}^{\infty} C_n x^n$ be the ordinary generating function of the Catalan numbers.
Claim. $C(x)$ satisfies the functional equation $C(x) = 1 + x C(x)^2$.
Proof. The recurrence $C_n = \sum_{i=0}^{n-1} C_i C_{n-1-i}$ says that $C_n$ (for $n \geq 1$) is the coefficient of $x^{n-1}$ in $C(x)^2$. Equivalently, $C(x) - 1 = x C(x)^2$, i.e., $C(x) = 1 + x C(x)^2$. $\square$
This is a quadratic equation in $C(x)$. Solving:
$$x C(x)^2 - C(x) + 1 = 0 \implies C(x) = \frac{1 \pm \sqrt{1 - 4x}}{2x}.$$
The branch $C(x) = \frac{1 + \sqrt{1-4x}}{2x}$ blows up at $x = 0$, so we take the minus sign:
$$C(x) = \frac{1 - \sqrt{1 - 4x}}{2x}.$$
This can be verified by expanding $\sqrt{1-4x}$ via the binomial series:
$$\sqrt{1-4x} = \sum_{n=0}^{\infty} \binom{1/2}{n}(-4x)^n = 1 - 2\sum_{n=1}^{\infty} \frac{1}{n+1}\binom{2n}{n} x^n$$
and substituting back recovers $C_n = \frac{1}{n+1}\binom{2n}{n}$.
Asymptotics
By Stirling’s approximation ($n! \sim \sqrt{2\pi n}(n/e)^n$):
$$C_n = \frac{1}{n+1}\binom{2n}{n} \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}.$$
The Catalan numbers grow like $4^n$ with a polynomial correction of $n^{-3/2}$. Specifically:
$$C_n \sim \frac{4^n}{n^{3/2} \sqrt{\pi}} \quad \text{as } n \to \infty.$$
This means the Catalan numbers grow much faster than exponentially in a practical sense: $C_{20} = 6{,}564{,}120{,}420$ and $C_{30}$ is already in the trillions. The factorial denominator in the formula provides only polynomial suppression.
Applications in Computer Science
Binary search trees. The number of distinct BSTs that can be built from $n$ distinct keys $1 < 2 < \cdots < n$ is $C_n$. Each such BST corresponds to a full binary tree structure: choose which key is the root (say key $k$), then the left subtree contains keys $1, \ldots, k-1$ and the right contains $k+1, \ldots, n$, giving the Catalan recurrence.
Matrix chain multiplication. The number of ways to parenthesize a product of $n+1$ matrices is $C_n$. Different parenthesizations correspond to different orders of matrix multiplication, with wildly different operation counts. The dynamic programming solution to the matrix chain problem implicitly exploits the Catalan structure.
Stack-sortable permutations. A permutation of $\{1, \ldots, n\}$ is stack-sortable if it can be sorted to $1, 2, \ldots, n$ by pushing elements onto a stack and popping them in order. The number of stack-sortable permutations of $n$ elements is $C_n$.
Parse trees. For an ambiguous-free grammar with binary productions, the number of distinct parse trees for a string of $n+1$ terminals is $C_n$. The Catalan structure encodes the possible branching structures of the parse.
Summary
| Formula | Expression |
|---|---|
| Recurrence | $C_0 = 1$, $C_n = \sum_{i=0}^{n-1} C_i C_{n-1-i}$ |
| Closed form | $C_n = \frac{1}{n+1}\binom{2n}{n}$ |
| Generating function | $C(x) = \frac{1 - \sqrt{1-4x}}{2x}$ |
| Asymptotic | $C_n \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}$ |
Read next: