Space Complexity - When Memory, Not Time, Is the Bottleneck
Helpful context:
- P, NP, and the Hardest Problems - The Question That Keeps Theorists Up at Night
- Turing Machines - The Minimal Model of Everything a Computer Can Do
Sorting $n$ numbers takes $O(n \log n)$ time. But how much memory does it need? Merge sort needs $O(n)$ extra space - it copies half the array at each merge step. Heapsort needs $O(1)$ - it works entirely in place. These are very different machines. One could run on a tiny embedded device with barely any RAM; the other can’t. Time complexity asks “how long does this take?” Space complexity asks “how much memory does this require?” and the two questions don’t always agree on what’s hard.
This post is about what we can do when we constrain memory instead of time, and the surprisingly clean theorems that result.
What We’re Measuring
On a Turing machine, we work with a read-only input tape and separate work tapes. Space complexity counts the work tape cells used - not the input. This lets us meaningfully talk about sublinear space, where a machine reads a long input but stores only a tiny summary of what it has seen.
A language $L$ is in $\text{DSPACE}(s(n))$ if there is a deterministic TM deciding $L$ using at most $O(s(n))$ work tape cells on inputs of length $n$. Replace deterministic with nondeterministic and you get $\text{NSPACE}(s(n))$.
The two most important space complexity classes:
$$L = \text{DSPACE}(\log n), \qquad \text{PSPACE} = \bigcup_{k \geq 1} \text{DSPACE}(n^k)$$
Log space is tight - you can store a constant number of pointers into the input, but not a copy of it. Polynomial space is generous - it includes everything in P and NP, and probably more.
The Space Hierarchy Theorem
The cleanest theorem in complexity theory: more space strictly equals more power.
Theorem (Space Hierarchy). If $s_1(n) = o(s_2(n))$ and $s_2(n) \geq \log n$, then
$$\text{DSPACE}(s_1(n)) \subsetneq \text{DSPACE}(s_2(n)).$$
The proof is a diagonalization: build a machine that runs in exactly $s_2(n)$ space and disagrees with every $o(s_2(n))$-space machine on some input. Space is reusable - the same cells can be overwritten - so counting arguments are exact in a way they aren’t for time.
The punchline: $\text{SPACE}(n) \subsetneq \text{SPACE}(n^2)$. We can prove strict separations for space. Compare this to time, where we don’t know whether $P = NP$.
PSPACE and Its Complete Problem
PSPACE contains all of P and all of NP. A polynomial-time computation visits at most $t(n)$ cells, so $\text{DTIME}(t(n)) \subseteq \text{DSPACE}(t(n))$. But PSPACE may contain much more than NP - problems that require exploring an exponentially large search space but can reuse memory along the way.
The canonical PSPACE-complete problem is TQBF (True Quantified Boolean Formulas): given a Boolean formula with all variables bound by alternating $\forall$ and $\exists$ quantifiers, is it true?
$$\exists x_1 ; \forall x_2 ; \exists x_3 ; \cdots ; \varphi(x_1, x_2, x_3, \ldots)$$
TQBF is to PSPACE as SAT is to NP. The alternating quantifiers mirror a two-player game: one player chooses the $\exists$ variables, the adversary chooses the $\forall$ variables. Evaluating TQBF is equivalent to determining optimal play in a two-player game.
This gives PSPACE-completeness a concrete meaning: games. Generalized chess (on an $n \times n$ board), generalized Go, Othello - they are all PSPACE-complete or EXPTIME-complete. The computational difficulty of perfect play traces back to alternating quantifiers.
Logspace: L and NL
At the low end sits $L = \text{DSPACE}(\log n)$: problems solvable with memory for only a constant number of pointers. DFS on a graph needs $O(n)$ space for the stack, so it isn’t in $L$. But many important problems are.
$NL = \text{NSPACE}(\log n)$ adds nondeterminism. The canonical NL-complete problem is directed reachability (STCON): given a directed graph and vertices $s$ and $t$, is there a path from $s$ to $t$?
STCON is in NL because you can nondeterministically walk from $s$, storing only the current vertex and a step counter - $O(\log n)$ bits total. And every NL problem reduces to STCON: given any NL machine, build its configuration graph. Reachability from the start configuration to an accepting configuration is exactly acceptance.
Undirected reachability is in L. This is a landmark result: Reingold (2008) showed that even symmetric reachability - is there a path in an undirected graph? - can be decided in $O(\log n)$ space. The algorithm constructs an expander-based walk without storing a DFS stack.
Savitch’s Theorem
For time complexity, we don’t know whether nondeterminism helps. For space, we have a definitive answer: nondeterminism buys at most a quadratic increase.
Theorem (Savitch, 1970). For $s(n) \geq \log n$:
$$\text{NSPACE}(s(n)) \subseteq \text{DSPACE}(s(n)^2).$$
In particular, $\text{NPSPACE} = \text{PSPACE}$.
The idea is to reduce nondeterministic space to reachability. Define the predicate $\text{REACH}(C, C', t)$ = “can configuration $C$ reach $C'$ in at most $t$ steps?” We compute this recursively:
$$\text{REACH}(C, C', 2^k) = \exists C_m : \text{REACH}(C, C_m, 2^{k-1}) \wedge \text{REACH}(C_m, C', 2^{k-1}).$$
A deterministic machine tries all possible midpoints $C_m$, recurses to depth $O(s(n))$, reusing the same work tape cells at each level. Total space: $O(s(n)) \times O(s(n)) = O(s(n)^2)$.
The key insight: space is reusable. When you recurse on the second half of the path, you can overwrite the memory you used for the first half. This is why nondeterminism’s advantage collapses much more in space than in time.
NL = co-NL
For NP, we don’t know whether $NP = \text{co-}NP$. For NL, we do.
Theorem (Immerman and Szelepcsényi, 1988). $NL = \text{co-}NL$.
This means: the complement of every NL problem is also in NL. In particular, non-reachability (proving there is no path from $s$ to $t$) is in NL - even though you’d think you’d have to check all possible paths.
The trick is counting. A nondeterministic machine can count the exact number of vertices reachable from $s$ in $\leq i$ steps, using only $O(\log n)$ space, by an inductive argument. Once you know the count $r_i$ from the previous round, you can verify completeness: does the current round find exactly $r_{i+1}$ reachable vertices? If a vertex is missing from your reachable set and the count is still right, you’ve proved it’s unreachable.
This is one of the most surprising results in complexity theory. Nondeterminism lets you guess positive witnesses. The counting technique turns it into a tool for certifying negative facts too.
Discomfort check. PSPACE contains both P and NP. So does knowing PSPACE tell us anything about P vs NP?
Not directly. We have the chain $P \subseteq NP \subseteq \text{PSPACE} \subseteq \text{EXPTIME}$, and we know $P \neq \text{EXPTIME}$ by the time hierarchy theorem. So at least one inclusion in that chain is strict - but we don’t know which. It’s consistent that $P = NP = \text{PSPACE}$, and it’s consistent that all four are different. The space hierarchy theorem tells us $\text{PSPACE} \neq \text{EXPSPACE}$, but says nothing about the P vs NP boundary.
Summary
| Class | Definition | Representative problem | Notes |
|---|---|---|---|
| $L$ | $\text{DSPACE}(\log n)$ | Graph connectivity (undirected) | Reingold 2008 |
| $NL$ | $\text{NSPACE}(\log n)$ | Directed reachability (STCON) | NL = co-NL |
| $P$ | Poly-time, deterministic | Sorting, shortest paths | $P \subseteq NL$? Open |
| $NP$ | Poly-time, nondeterministic | SAT, 3-coloring | $NP \subseteq \text{PSPACE}$ |
| $\text{PSPACE}$ | Poly-space | TQBF, generalized games | NPSPACE = PSPACE (Savitch) |
| $\text{EXPSPACE}$ | Exp-space | Some formal language problems | $\text{PSPACE} \subsetneq \text{EXPSPACE}$ |
The big structural lesson: space hierarchy theorems are easy to prove (space is reusable, diagonalization is clean), while time hierarchy theorems require more care. Nondeterminism matters far less for space than for time - Savitch’s theorem gives us a square, not an exponential. And NL = co-NL tells us that even asymmetric-looking problems have symmetric power when memory is the bottleneck.
Read next: