Permutations & Combinations - Order and Selection, Made Precise // Megha Bose

Helpful context:

Counting & Choice - How to Count Without Counting Everything

In 1708, a French mathematician named Pierre Rémond de Montmort published a book on games of chance. In it, he posed a puzzle about a card game called treize - thirteen. Players turn over cards one at a time, calling out “one, two, three…” as they go. If a card’s face value ever matches the number being called, the dealer wins. The question: what is the probability the dealer never wins - that no card ever lands in its matching position?

Montmort worked out the answer. It is approximately 36.8%, and this probability barely changes whether you play with 10 cards or 10 million. The exact value it converges to is $1/e$ - Euler’s number, emerging from a problem that has nothing to do with calculus.

This post builds up to that result. Along the way: arrangements where some items are identical, circular seating problems, Pascal’s triangle (which is far older than Pascal), and the theorem that connects combinatorics to polynomial algebra.

Arrangements With Repeated Elements

With $n$ distinct items, there are $n!$ ways to order them. But what if some items are identical?

Take the word MISSISSIPPI - 11 letters, but M appears once, I appears 4 times, S appears 4 times, and P appears twice. How many distinct arrangements exist?

If all 11 letters were unique, the answer would be $11!$. But the four I’s are interchangeable: swapping any two of them leaves the word unchanged. So $11!$ overcounts every distinct arrangement by $4!$ (the number of ways to permute the four I’s among themselves). The same overcounting applies independently to the S’s (factor of $4!$) and the P’s (factor of $2!$). Dividing out all of it:

$$\frac{11!}{1!\cdot 4!\cdot 4!\cdot 2!} = 34{,}650$$

This generalizes. If you have $n$ items where type $i$ appears $n_i$ times (with $n_1 + n_2 + \cdots + n_k = n$), the number of distinct arrangements is the multinomial coefficient:

$$\binom{n}{n_1, n_2, \ldots, n_k} = \frac{n!}{n_1! n_2! \cdots n_k!}$$

Why dividing works exactly. Every distinct arrangement is overcounted by the same factor - $n_1! \cdot n_2! \cdots n_k!$ - because each type’s identical copies can be permuted among themselves in $n_i!$ ways without changing the arrangement. Since the overcounting is identical for every distinct arrangement, a single division by the product of all $n_i!$ removes it uniformly.

Circular Arrangements

Seat 5 people at a round table. If the seats were labeled 1 through 5, there would be $5! = 120$ arrangements. But at a round table, labels do not exist - what matters is who sits next to whom. Rotating everyone one seat clockwise produces the same relative arrangement.

Each distinct circular arrangement corresponds to exactly 5 labeled arrangements (one per rotation). So the number of truly distinct seatings is $5!/5 = 4! = 24$.

In general: $n$ people around a circle gives $(n-1)!$ distinct arrangements. The clean argument is to fix one person’s position to eliminate the rotational ambiguity, then arrange the remaining $n-1$ freely.

For a necklace - where flipping is also considered the same (a necklace and its mirror image are identical) - divide by 2 again: $\frac{(n-1)!}{2}$ distinct necklaces. Chemists encounter the same situation with cyclic molecules: rotating or flipping the ring does not change the compound.

Derangements: Nothing in Its Place

A derangement is an arrangement of $n$ objects where nothing lands in its original position. In Montmort’s card game, a derangement is a deal where no card ever matches its called number.

Small cases. With $n = 2$ items $\{1, 2\}$, there are $2! = 2$ arrangements: $(1,2)$ and $(2,1)$. Only $(2,1)$ is a derangement - the one where both items are displaced. With $n = 3$, there are $3! = 6$ arrangements; only $(2,3,1)$ and $(3,1,2)$ are derangements. The counts are:

$$D_1 = 0,\quad D_2 = 1,\quad D_3 = 2,\quad D_4 = 9,\quad D_5 = 44$$

The formula. Counting derangements uses a technique called inclusion-exclusion - the next post explores it fully, but the core idea is simple: when you want to count arrangements that avoid a set of forbidden conditions, you alternate between overcounting and undercounting until the errors cancel.

Here, the forbidden conditions are: item 1 is in its original place, item 2 is in its original place, …, item $n$ is in its original place. We want arrangements that avoid all of them.

Start with all $n!$ arrangements. Subtract those where at least one item is fixed: there are $n$ ways to choose which item to fix, and $(n-1)!$ arrangements for the rest, giving $\binom{n}{1}(n-1)!$ to subtract. But now we have over-subtracted - arrangements where two items are simultaneously fixed were subtracted twice. Add them back: $\binom{n}{2}(n-2)!$. Then subtract arrangements with three fixed items: $\binom{n}{3}(n-3)!$. Continue alternating:

$$D_n = n! - \binom{n}{1}(n-1)! + \binom{n}{2}(n-2)! - \binom{n}{3}(n-3)! + \cdots$$

Each term $\binom{n}{k}(n-k)!$ simplifies to $n!/k!$, so:

$$D_n = n!\left(1 - 1 + \frac{1}{2!} - \frac{1}{3!} + \cdots + \frac{(-1)^n}{n!}\right)$$

The surprising limit. The sum $\sum_{k=0}^{n}(-1)^k/k!$ is the partial sum of the Taylor series for $e^{-1} = e^x$ evaluated at $x = -1$. The partial sums converge extremely fast: by $n = 10$ the approximation is accurate to 5 decimal places. So:

$$D_n \approx \frac{n!}{e}, \qquad P(\text{random arrangement is a derangement}) \to \frac{1}{e} \approx 0.368$$

Regardless of whether the party has 10 guests or 10 million: about 37% of random hat shuffles result in nobody getting their own hat back.

Why $e$ appears in a counting problem. Euler’s number enters here through its Taylor series: $e^{-1} = \sum_{k=0}^{\infty} (-1)^k/k!$. The inclusion-exclusion formula for derangements produces exactly this alternating series. It is a reminder that $e$ is not purely an analytic object - it has a combinatorial interpretation too, and it surfaces whenever you count arrangements by the principle of alternating overcounting and undercounting.

Where the recursion $D_n = (n-1)(D_{n-1} + D_{n-2})$ comes from.

Think about building a derangement of $n$ elements by choosing where element $n$ goes - it must go to some position $j \in \{1, \ldots, n-1\}$. Once $n$ occupies position $j$, whatever element was sitting in position $j$ gets pushed to position $n$. There are $n-1$ choices for $j$.

Now the question is: what was the state of the other $n-1$ elements before $n$ was inserted?

First attempt (incomplete). Suppose the other $n-1$ elements were already in a derangement of their own - each in a position that is not their own. Pushing the element at position $j$ over to position $n$ keeps it deranged (since $j \neq n$ for any element in $[n-1]$), and $n$ in position $j \neq n$ is also deranged. So any derangement of $n-1$ elements, combined with any of $n-1$ choices for $j$, produces a valid derangement of $n$. This gives $(n-1) \cdot D_{n-1}$ derangements.

The logical flaw. A natural but incorrect conclusion would be $D_n = (n-1)D_{n-1}$. This is wrong because it misses a whole family of derangements - the ones where the element sitting in position $j$ was correctly placed before $n$ arrived.

The missed case. Suppose element $j$ is sitting in its own position $j$ before the swap (so the other $n-2$ elements form a derangement among themselves). After placing $n$ in position $j$ and pushing $j$ to position $n$:

Element $n$ is in position $j \neq n$. ✓
Element $j$ is in position $n \neq j$. ✓
The other $n-2$ elements are still deranged. ✓

This is a perfectly valid derangement of $n$, and it cannot arise from the first case (there, element $j$ was deranged before the swap, not correctly placed). So these are genuinely new derangements. For each of the $n-1$ choices of $j$, there are $D_{n-2}$ derangements of the remaining $n-2$ elements - contributing $(n-1) \cdot D_{n-2}$.

Why can’t more than one element be correctly placed before the swap? If two elements $j$ and $k$ are both in their own positions before $n$ is inserted, then after $n$ takes position $j$ and $j$ moves to position $n$, element $k$ remains in position $k$ - giving $\sigma(k) = k$, which breaks the derangement. So exactly zero or one elements can be correctly placed in the starting configuration.

Adding both cases:

$$D_n = (n-1) D_{n-1} + (n-1) D_{n-2} = (n-1)(D_{n-1} + D_{n-2})$$

Pascal’s Triangle

Pascal’s triangle was not invented by Pascal. The same structure appears in Indian mathematics (Pingala, around 300 BCE, counting rhythmic patterns in Sanskrit poetry), in Chinese mathematics (Yang Hui, 13th century), and in Persian mathematics (Omar Khayyam, 11th century). Pascal systematized it and drew out its connections to probability and algebra in 1654 - which is why his name stuck.

The rule is simple: every interior entry equals the sum of the two entries directly above it. The entry in row $n$, position $k$ is $\binom{n}{k}$.

The addition rule is Pascal’s identity:

$$\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}$$

Why it holds. Fix one element $x$ from a set of $n$. Every $k$-element subset either contains $x$ (choose the remaining $k-1$ from the other $n-1$ elements: $\binom{n-1}{k-1}$ ways) or does not (choose all $k$ from the other $n-1$ elements: $\binom{n-1}{k}$ ways). Every subset falls into exactly one case.

Row sums. Each row sums to $2^n$: every element is either included in a subset or excluded, independently, giving $2^n$ total subsets. Row $n$ of the triangle lists all the $\binom{n}{k}$ values, one for each subset size, so they sum to the total.

Shallow diagonals. Summing a contiguous segment along any diagonal of the triangle gives the entry diagonally below and to the right - the “hockey stick identity”: $\sum_{i=0}^{r} \binom{n+i}{i} = \binom{n+r+1}{r}$.

The Binomial Theorem

Expanding $(x + y)^n$ means multiplying $n$ copies of $(x + y)$ together. Each factor contributes either $x$ or $y$. The term $x^k y^{n-k}$ arises from every selection of $k$ factors that contribute $x$ and $n-k$ factors that contribute $y$. The number of such selections is $\binom{n}{k}$, which is why binomial coefficients have that name.

$$(x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}$$

Setting $x = y = 1$ recovers the row-sum identity $2^n = \sum_{k=0}^n \binom{n}{k}$.

Setting $x = 1$, $y = -1$ gives $0 = \sum_{k=0}^n (-1)^k \binom{n}{k}$: the alternating row sums always cancel. This means in every row, the sum of even-position entries equals the sum of odd-position entries.

Vandermonde’s Identity

Split a group of $m + n$ people into subgroup $A$ (size $m$) and subgroup $B$ (size $n$). Choosing $r$ people from the combined group is the same as choosing $k$ from $A$ and $r-k$ from $B$, for each possible value of $k$:

$$\binom{m+n}{r} = \sum_{k=0}^{r} \binom{m}{k}\binom{n}{r-k}$$

A particularly clean special case: $\binom{2n}{n} = \sum_{k=0}^n \binom{n}{k}^2$. The number of ways to choose $n$ people from a group of $2n$ equals the sum of squares of row $n$ in Pascal’s triangle.

CS Perspective: Generating Permutations

Fisher-Yates shuffle. To produce a uniformly random arrangement of $n$ elements in $O(n)$ time:

for i from n-1 down to 1:
    j = random integer in [0, i]
    swap A[i] and A[j]

At each step $i$, element $A[i]$ is chosen uniformly from the positions not yet finalized. The result is uniform over all $n!$ arrangements. A naive alternative - repeatedly picking random positions until all are filled - runs in $O(n^2)$ time or produces non-uniform distributions if done carelessly.

Lexicographic generation. All $n!$ arrangements can be enumerated in lexicographic order in $O(1)$ amortized time per arrangement:

Find the rightmost position $i$ where $\sigma(i) < \sigma(i+1)$.
Find the smallest $\sigma(j) > \sigma(i)$ to the right of $i$.
Swap $\sigma(i)$ and $\sigma(j)$, then reverse the suffix after position $i$.

Derangement sampling. To pick a random derangement, generate random arrangements and reject any that have a fixed point. Since $\approx 36.8%$ of random arrangements are derangements, each attempt succeeds with probability $\approx 1/e$, and the expected number of attempts is $e \approx 2.72$. Despite the rejection, this is $O(n)$ expected time.

Read next:

Inclusion-Exclusion - Overcounting Your Way to the Right Answer