Prerequisite: Turing Machines

Computational Complexity

Undecidability established the absolute limits of computation: some problems have no algorithmic solution. Complexity theory asks a sharper question: among the problems that are decidable, which ones are efficiently solvable?

We measure efficiency by the number of steps a Turing machine takes as a function of input length. A TM $M$ runs in time $f(n)$ if on every input of length $n$, it halts within $f(n)$ steps. This gives us a complexity measure robust to the choice of reasonable computational model (multi-tape TMs, RAMs, etc.) - all are polynomially equivalent.

The Class P

$$\mathbf{P} = \bigcup_{k \geq 0} \text{TIME}(n^k)$$

$\mathbf{P}$ is the class of languages decidable in polynomial time on a deterministic TM. Polynomial time is our formal proxy for “efficient” computation for several reasons:

  • Polynomial time is closed under composition: running a polynomial algorithm as a subroutine a polynomial number of times still yields polynomial time.
  • Polynomial time is model-independent: the same class is captured by TMs, RAMs, and real hardware (with polynomial simulation overhead).
  • In practice, polynomial algorithms are almost always fast enough for large inputs, while exponential algorithms are almost always too slow.

Examples of problems in $\mathbf{P}$: shortest paths (Dijkstra, $O(m \log n)$), sorting ($O(n \log n)$), primality testing (AKS, $O(\log^6 n)$), linear programming (ellipsoid method, polynomial).

The Class NP

$$\mathbf{NP} = \bigcup_{k \geq 0} \text{NTIME}(n^k)$$

$\mathbf{NP}$ is the class of languages decidable in polynomial time on a nondeterministic TM. Equivalently (and more operationally useful):

Verifier definition. $L \in \mathbf{NP}$ iff there exists a polynomial-time TM $V$ (a verifier) and a polynomial $p$ such that: $$L = {w : \exists c \text{ with } |c| \leq p(|w|) \text{ and } V(w, c) \text{ accepts}}$$

The string $c$ is called a certificate or witness. $\mathbf{NP}$ is the class of problems where solutions are polynomially verifiable.

Examples:

  • SAT. $\phi \in$ SAT iff a satisfying assignment exists. Given the assignment, verification is linear time.
  • Hamiltonian path. Given graph $G$, does it contain a Hamiltonian path? Certificate: the path itself. Verification: check each vertex appears once and consecutive vertices are adjacent.
  • Graph $k$-coloring. Does $G$ have a proper $k$-coloring? Certificate: the coloring. Verification: check each edge has differently colored endpoints.
  • Clique. Does $G$ contain a clique of size $k$? Certificate: the clique. Verification: check all $\binom{k}{2}$ edges exist.

P vs. NP

The most celebrated open problem in mathematics and computer science:

$$\mathbf{P} \stackrel{?}{=} \mathbf{NP}$$

We know $\mathbf{P} \subseteq \mathbf{NP}$ (any deterministic computation is a special case of nondeterministic computation). The question is whether the containment is proper.

If $\mathbf{P} = \mathbf{NP}$: every problem whose solution can be verified in polynomial time can also be solved in polynomial time. Consequences: current public-key cryptography collapses (factoring, discrete log are in $\mathbf{NP}$); mathematical proofs could be found automatically for any theorem with a polynomial verifier; protein folding and drug design become tractable; essentially all combinatorial optimization problems become easy.

If $\mathbf{P} \neq \mathbf{NP}$: there are problems that are easy to verify but genuinely hard to solve. This is the world virtually all researchers believe we live in.

Why Most Researchers Believe P $\neq$ NP

No proof exists either way, but the evidence for $\mathbf{P} \neq \mathbf{NP}$ is substantial:

Empirical. Thousands of NP-complete problems have been studied for over 50 years. Despite enormous effort, no polynomial algorithm has been found for any of them. In contrast, most problems known to be in $\mathbf{P}$ were found to have polynomial algorithms quickly once studied.

Algebraization barrier. Baker, Gill, and Solovay (1975) showed there exist oracles $A$ with $\mathbf{P}^A = \mathbf{NP}^A$ and oracles $B$ with $\mathbf{P}^B \neq \mathbf{NP}^B$. This means any proof technique that “relativizes” (works the same with oracle access) cannot resolve P vs. NP. Most classical techniques relativize, ruling them out.

Natural proofs barrier. Razborov and Rudich (1994) showed that a large class of proof techniques for circuit lower bounds (“natural proofs”) cannot prove superpolynomial lower bounds against circuits with pseudorandom properties - unless one-way functions do not exist. Ruling out a whole class of proof strategies.

The Cook-Levin Theorem

Theorem (Cook 1971, Levin 1973). SAT is NP-complete.

Proof sketch. SAT is clearly in NP. We must show every $L \in \mathbf{NP}$ reduces to SAT in polynomial time. Let $V$ be a polynomial-time verifier for $L$, running in time $p(n)$. On input $w$ of length $n$, we encode the entire computation of $V$ on $(w, c)$ as a Boolean formula $\phi_{w}$ over $O(p(n)^2)$ variables representing:

  • The tableau: $\text{cell}[i][j][s]$ = “at time $i$, tape position $j$ contains symbol $s$”
  • The state: $\text{state}[i][q]$ = “at time $i$, $V$ is in state $q$”
  • The head: $\text{head}[i][j]$ = “at time $i$, the head is at position $j$”

Clauses enforce:

  1. The start configuration matches $w$.
  2. Each step respects $V$’s transition function.
  3. The final state is $q_\text{accept}$.

The formula $\phi_w$ is satisfiable iff some certificate $c$ makes $V(w,c)$ accept, iff $w \in L$. The formula has polynomial size and is computable in polynomial time. Since every NP language reduces to SAT, SAT is NP-hard.

Co-NP and the Polynomial Hierarchy

Co-NP is the class of complements of NP languages: $L \in \mathbf{co\text{-}NP}$ iff $\overline{L} \in \mathbf{NP}$.

If $\mathbf{P} = \mathbf{NP}$, then $\mathbf{NP} = \mathbf{co\text{-}NP}$ (since $\mathbf{P}$ is closed under complement). Conversely, if $\mathbf{NP} \neq \mathbf{co\text{-}NP}$, then $\mathbf{P} \neq \mathbf{NP}$.

$\mathbf{NP} \cap \mathbf{co\text{-}NP}$: Problems here have both short proofs of membership and short proofs of non-membership. Examples:

  • Integer factorization. Given $n$ and $k$, does $n$ have a factor $\leq k$? Believed to be in $\mathbf{NP} \cap \mathbf{co\text{-}NP}$ but not in $\mathbf{P}$.
  • Primality testing. AKS (2002) put this definitively in $\mathbf{P}$.

PSPACE contains both $\mathbf{NP}$ and $\mathbf{co\text{-}NP}$: $\mathbf{NP} \cup \mathbf{co\text{-}NP} \subseteq \mathbf{NP} \cap \mathbf{co\text{-}NP} \subseteq \mathbf{PSPACE}$. Problems complete for PSPACE include quantified Boolean formula (QBF) satisfiability.

Examples

Example 1: TSP (decision version). Given a complete weighted graph on $n$ cities and a budget $B$, does a tour visiting every city exactly once exist with total weight $\leq B$? This is in NP (certificate: the tour). It is NP-complete (proved via reduction from Hamiltonian path).

Example 2: Graph 3-coloring. Does a given graph have a proper 3-coloring? In NP (certificate: the coloring). NP-complete (proved via reduction from 3-SAT, see the next post).

Example 3: Subset sum. Given integers $a_1, \ldots, a_n$ and target $t$, does a subset sum to $t$? In NP (certificate: the subset). NP-complete.

Example 4: Why NP-hardness is useful information. If you are assigned to build a scheduler that minimizes conflicts (an NP-hard problem), knowing this tells you: do not look for a polynomial exact algorithm; instead use approximation algorithms, heuristics (simulated annealing, genetic algorithms), integer programming solvers, or exploit special structure (e.g., interval graph coloring is polynomial).


Read Next: NP-Completeness & Reductions