Counting & Choice - How to Count Without Counting Everything
Helpful context:
Counting is deceptively hard. The question “how many ways can you do X?” sounds elementary, but answering it precisely requires a small but powerful toolkit. This post builds that toolkit from first principles.
The Two Fundamental Principles
Multiplication Principle. If a procedure consists of $k$ sequential steps, where step $i$ can be performed in $n_i$ ways regardless of how earlier steps were performed, then the total number of ways is $n_1 \cdot n_2 \cdots n_k$.
Example. A password of length 3 using lowercase letters: $26 \cdot 26 \cdot 26 = 26^3 = 17{,}576$ possibilities.
Addition Principle. If a procedure can be performed in mutually exclusive ways $A_1, A_2, \ldots, A_k$, then the total is $|A_1| + |A_2| + \cdots + |A_k|$.
This is just set union for disjoint sets: $|A_1 \cup \cdots \cup A_k| = \sum_{i=1}^k |A_i|$ when the $A_i$ are pairwise disjoint.
Ordered Selection Without Replacement: Permutations
A permutation of $k$ items chosen from $n$ distinct items is an ordered arrangement. The count is:
$$P(n, k) = \frac{n!}{(n-k)!} = n(n-1)(n-2)\cdots(n-k+1)$$
Why. First position: $n$ choices. Second: $n-1$ (one item used). Third: $n-2$. Continuing, the $k$-th position has $n-k+1$ choices. Multiplying these together and writing compactly gives $n!/(n-k)!$.
Example. 10 athletes; gold, silver, bronze to three different people. Podium arrangements: $P(10, 3) = 10 \times 9 \times 8 = 720$.
Special case: $P(n, n) = n!$ - arranging all $n$ items.
Unordered Selection Without Replacement: Combinations
A combination of $k$ items from $n$ is an unordered subset - a group where the internal order does not matter.
Why is the formula $\binom{n}{k} = \frac{n!}{k!(n-k)!}$? And why divide by $k!$ specifically?
Suppose you want to choose a 3-person committee from 10 people. No roles - just membership. How many committees exist?
If order mattered (say, picking president then VP then secretary), the answer would be $P(10, 3) = 10 \times 9 \times 8 = 720$.
But the committee $\{$Alice, Bob, Carol$\}$ is the same group regardless of the order you list them. How many times does a single committee appear in those 720 ordered arrangements? Exactly once for each way to order its 3 members:
| Ordering | Sequence |
|---|---|
| 1 | Alice, Bob, Carol |
| 2 | Alice, Carol, Bob |
| 3 | Bob, Alice, Carol |
| 4 | Bob, Carol, Alice |
| 5 | Carol, Alice, Bob |
| 6 | Carol, Bob, Alice |
That is $3! = 6$ orderings - not 5, not 7. Why exactly 6? Because picking an ordering of 3 people means: 3 choices for first slot, 2 for second, 1 for third, giving $3 \times 2 \times 1 = 3! = 6$. This holds for any group of 3 distinct people.
So the 720 ordered arrangements count each committee exactly 6 times. The number of committees is $720 / 6 = 120$.
Generalizing: $P(n,k)$ counts every $k$-element subset exactly $k!$ times (once per ordering of that subset). Dividing removes the overcounting:
$$\binom{n}{k} = \frac{P(n,k)}{k!} = \frac{n!}{k!(n-k)!}$$
Why dividing by $k!$ works perfectly. The key is that every subset of size $k$ has exactly the same number of internal orderings: $k!$. Not approximately - exactly. So dividing $P(n,k)$ by $k!$ removes the same factor from every subset simultaneously, leaving each counted exactly once.
Symmetry. $\binom{n}{k} = \binom{n}{n-k}$. Choosing which $k$ items to include is the same as choosing which $n-k$ items to exclude. The algebra confirms it: $\frac{n!}{k!(n-k)!} = \frac{n!}{(n-k)!k!}$.
The Four Selection Cases
Two binary choices - ordered vs. unordered, with vs. without replacement - give four cases:
| Without replacement | With replacement | |
|---|---|---|
| Ordered | $P(n,k) = n! / (n-k)!$ | $n^k$ |
| Unordered | $\binom{n}{k} = n! / (k!(n-k)!)$ | $\binom{n+k-1}{k}$ |
Each cell has a different character and requires a different intuition. The bottom-right (unordered with replacement) is the most subtle.
Ordered Selection With Replacement: $n^k$
At each of $k$ steps, pick any of $n$ options freely (the item goes back, so it remains available next time). The multiplication principle gives $n^k$ total sequences.
Example. A 4-digit PIN over digits 0-9: $10^4 = 10{,}000$ possibilities.
Example. Tossing a coin 8 times: $2^8 = 256$ possible outcome sequences.
This formula appears whenever: items can repeat, and the order of choices matters. It is also the answer to “how many functions are there from a $k$-element set to an $n$-element set?” - each function assigns one of $n$ values to each of $k$ inputs, independently.
Unordered Selection With Replacement: Stars and Bars
Suppose you choose $k$ items from $n$ types, repetition allowed, and you care only about how many of each type you get - not the order. This counts multisets.
Example. An ice cream shop has 4 flavors. You want 3 scoops; the same flavor can appear multiple times; you don’t care about scoop order. How many distinct selections exist?
The bijection. Represent a selection as a string of $k$ stars (items chosen) and $n-1$ bars (dividers between types). Stars to the left of the first bar count toward type 1, stars between bar 1 and bar 2 count toward type 2, and so on.
The full string has $k + (n-1) = n+k-1$ positions. We choose which $k$ positions are stars (or equivalently which $n-1$ are bars). The number of arrangements is:
$$\binom{n+k-1}{k}$$
For 4 flavors and 3 scoops: $\binom{6}{3} = 20$ distinct selections.
Why $n-1$ bars, not $n$ bars? With $n$ types, you need $n-1$ dividers to create $n$ sections - just as 3 fences create 4 paddocks. The first section needs no left-fence (it starts at the beginning), and the last section needs no right-fence.
How this method came about. The combinatorial content is old, but the visual metaphor is surprisingly recent. Euler (18th century) worked extensively on partitions - counting ways to write an integer as a sum of others - and on generating functions. The problem of counting non-negative integer solutions to $x_1 + \cdots + x_n = k$ was implicitly present in that work, but Euler didn’t frame it as a counting-by-arrangement problem.
The bijective style of argument - “put the objects in one-to-one correspondence with something you already know how to count” - became a standard technique through 19th-century combinatorics (Cauchy, Cayley). The specific bijection for multisets belongs to that tradition, but was written out algebraically rather than drawn.
The stars-and-bars visual language as a named, teachable method is largely 20th century. William Feller’s An Introduction to Probability Theory and Its Applications (1950) is often credited with popularizing the device of literally drawing ★★|★| on a page so students could see the bijection directly rather than follow an algebraic argument. Before that, the same proof existed but was harder to internalize.
What makes the method elegant is that it makes the bijection obvious rather than clever. Once you draw the picture, you see immediately that every arrangement of $k$ stars and $n-1$ bars encodes exactly one multiset, and vice versa - and you already know how to count binary arrangements. The proof writes itself. That is the hallmark of a good bijective proof: the correspondence is so natural that the argument disappears and only the picture remains.
A Panorama of Equivalent Problems
The four formulas appear under many different names and problem descriptions. The table below shows the most important equivalences.
| Formula | Counting problem | Integer solutions equivalent | Example |
|---|---|---|---|
| $n^k$ | Ordered sequences of length $k$, $n$ options each step | — | 4-digit PIN over 0-9: $10^4$ |
| $P(n,k)$ | Ordered selection of $k$ from $n$ distinct items, no repeat | — | Podium (gold/silver/bronze) from 10: $10 \times 9 \times 8$ |
| $\dbinom{n}{k}$ | Unordered selection of $k$ from $n$, no repeat | Ways to choose $k$ elements of a set | 5-card hand from 52: $\binom{52}{5}$ |
| $\dbinom{n+k-1}{k}$ | Unordered selection of $k$ from $n$ types, repetition ok | Non-negative solutions to $x_1 + \cdots + x_n = k$ | 3 scoops from 4 flavors: $\binom{6}{3}$ |
| $\dbinom{k-1}{n-1}$ | Distribute $k$ identical items into $n$ non-empty bins | Positive solutions to $x_1 + \cdots + x_n = k$ | Split 8 cookies among 3 kids, each gets $\geq 1$: $\binom{7}{2}$ |
The integer-solutions perspective
The last two rows are especially useful because they connect to a type of problem that looks completely different on the surface: counting integer solutions to an equation.
Non-negative solutions: $\binom{n+k-1}{k}$. How many ways can you write $k$ as an ordered sum of $n$ non-negative integers?
$$x_1 + x_2 + \cdots + x_n = k, \quad x_i \geq 0$$
Example. How many ways can you score a total of 5 in 3 rounds of a game where each round score is a non-negative integer? This is the same as distributing 5 identical points among 3 rounds: $\binom{5+3-1}{5} = \binom{7}{5} = 21$.
Positive solutions: $\binom{k-1}{n-1}$. How many ways can you write $k$ as an ordered sum of $n$ positive integers?
$$x_1 + x_2 + \cdots + x_n = k, \quad x_i \geq 1$$
Pre-place one item in each bin (satisfying the $x_i \geq 1$ constraint), leaving $k-n$ items to distribute freely among $n$ bins. Those $k-n$ items can go anywhere, giving a non-negative problem with total $k-n$ and $n$ bins - but there is a cleaner direct argument: line up $k$ stars; the $n$ parts are created by choosing $n-1$ of the $k-1$ gaps between consecutive stars to insert a bar. This gives $\binom{k-1}{n-1}$.
Example. How many ways can you split 8 identical cookies among 3 children so each child gets at least 1? $\binom{8-1}{3-1} = \binom{7}{2} = 21$.
How to tell which formula to use. Ask two questions in order:
- Does order matter? If yes: are repeats allowed? Yes → $n^k$. No → $P(n,k)$.
- If order does not matter: are repeats allowed? No → $\binom{n}{k}$. Yes → $\binom{n+k-1}{k}$.
- If the problem involves distributing identical objects into non-empty bins (or positive integer sums): $\binom{k-1}{n-1}$.
The Pigeonhole Principle
Theorem (Pigeonhole). If $n$ items are placed into $m$ containers with $n > m$, then at least one container holds at least $\lceil n/m \rceil$ items.
Proof. If every container held at most $\lfloor n/m \rfloor < n/m$ items, the total would be less than $n$, contradicting that all $n$ items are placed.
The theorem sounds trivial. Its power is not in the mechanics - it is that it proves existence without constructing anything. You never find the pair; you prove one must exist.
Example: you cannot compress all files. Any lossless compression algorithm takes a file and produces a shorter one. But there are $2^{8n}$ possible files of size $n$ bytes. All files shorter than $n$ bytes total fewer than $2^{8n}$ possible values. By pigeonhole, at least two different input files must map to the same compressed output - meaning decompression cannot tell them apart and the algorithm fails on at least one input. No lossless algorithm compresses everything. Pigeonhole proves this in one sentence, without naming a specific algorithm or a specific file.
Inclusion-Exclusion: A Preview
When events are not disjoint, the addition principle overcounts. The fix is inclusion-exclusion:
$$|A \cup B| = |A| + |B| - |A \cap B|$$
More generally for three sets:
$$|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|$$
The general formula is developed in the Inclusion-Exclusion - Overcounting Your Way to the Right Answer post.
The Birthday Paradox
How many people need to be in a room before it is more likely than not that two of them share a birthday? Most people guess somewhere in the hundreds. The answer is 23.
Why the intuition fails. The naive thinking goes: there are 365 days, 23 people - each person covers roughly 23/365 ≈ 6% of the year, so a match seems unlikely. But this asks the wrong question. You are not asking whether one specific person matches someone else. You are asking whether any pair shares a birthday. With 23 people there are $\binom{23}{2} = 253$ pairs - and you only need one of those 253 to hit. 253 chances at a 1-in-365 shot is a very different prospect from 23 chances. The opportunities are far more numerous than the naive count suggests.
The calculation. It is easier to compute $P(\text{no match})$ and subtract. Person 1 picks any day. Person 2 must avoid that day: probability $364/365$. Person 3 must avoid both previous days: $363/365$. Continuing:
$$P(\text{no match among } k \text{ people}) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{365-k+1}{365}$$
At $k = 23$ this product falls just below $0.5$, so a shared birthday is more likely than not. At $k = 57$ the probability of a match exceeds 99%.
Intuition for the threshold. The number of pairs grows as $k^2/2$: with 10 people you have 45 pairs, with 23 you have 253, with 30 you have 435. Each pair is a fresh chance at a match against 365 days. Once the number of pairs is in the same ballpark as 365, a match becomes more likely than not. $\binom{23}{2} = 253$ is already two-thirds of the way to 365 - well into that zone. The key insight: collision opportunities grow as $k^2$, not $k$. Doubling the group size quadruples the pairs.
The birthday attack in cryptography. This same mathematics threatens hash functions. A cryptographic hash produces an $n$-bit digest. A collision means two different inputs hash to the same value. Naively you might think finding a collision requires trying $2^n$ inputs - the full space. But the birthday bound says collisions become likely after only $\approx \sqrt{2^n} = 2^{n/2}$ attempts. For a 128-bit hash, that is $2^{64}$ operations - not $2^{128}$. This is why modern hash functions like SHA-256 use 256 bits: to keep the birthday-attack cost at $2^{128}$, which remains infeasible.
The same argument applies to hash tables. If you insert $m$ keys into a table of size $n$, collisions become likely once $m \approx \sqrt{n}$ - far sooner than filling the table. A table that is only 0.1% full can already have a 50% chance of containing a collision if that 0.1% is roughly $\sqrt{n}$ entries.
Read next: