P, NP, and Hardest Problems
Prerequisite: Turing Machines
Computational Complexity
Undecidability established the absolute limits of computation: some problems have no algorithmic solution. Complexity theory asks a sharper question: among the problems that are decidable, which ones are efficiently solvable?
We measure efficiency by the number of steps a Turing machine takes as a function of input length. A TM $M$ runs in time $f(n)$ if on every input of length $n$, it halts within $f(n)$ steps. This gives us a complexity measure robust to the choice of reasonable computational model (multi-tape TMs, RAMs, etc.) - all are polynomially equivalent.
The Class P
$$\mathbf{P} = \bigcup_{k \geq 0} \text{TIME}(n^k)$$
$\mathbf{P}$ is the class of languages decidable in polynomial time on a deterministic TM. Polynomial time is our formal proxy for “efficient” computation for several reasons:
- Polynomial time is closed under composition: running a polynomial algorithm as a subroutine a polynomial number of times still yields polynomial time.
- Polynomial time is model-independent: the same class is captured by TMs, RAMs, and real hardware (with polynomial simulation overhead).
- In practice, polynomial algorithms are almost always fast enough for large inputs, while exponential algorithms are almost always too slow.
Examples of problems in $\mathbf{P}$: shortest paths (Dijkstra, $O(m \log n)$), sorting ($O(n \log n)$), primality testing (AKS, $O(\log^6 n)$), linear programming (ellipsoid method, polynomial).
The Class NP
$$\mathbf{NP} = \bigcup_{k \geq 0} \text{NTIME}(n^k)$$
$\mathbf{NP}$ is the class of languages decidable in polynomial time on a nondeterministic TM. Equivalently (and more operationally useful):
Verifier definition. $L \in \mathbf{NP}$ iff there exists a polynomial-time TM $V$ (a verifier) and a polynomial $p$ such that: $$L = {w : \exists c \text{ with } |c| \leq p(|w|) \text{ and } V(w, c) \text{ accepts}}$$
The string $c$ is called a certificate or witness. $\mathbf{NP}$ is the class of problems where solutions are polynomially verifiable.
Examples:
- SAT. $\phi \in$ SAT iff a satisfying assignment exists. Given the assignment, verification is linear time.
- Hamiltonian path. Given graph $G$, does it contain a Hamiltonian path? Certificate: the path itself. Verification: check each vertex appears once and consecutive vertices are adjacent.
- Graph $k$-coloring. Does $G$ have a proper $k$-coloring? Certificate: the coloring. Verification: check each edge has differently colored endpoints.
- Clique. Does $G$ contain a clique of size $k$? Certificate: the clique. Verification: check all $\binom{k}{2}$ edges exist.
P vs. NP
The most celebrated open problem in mathematics and computer science:
$$\mathbf{P} \stackrel{?}{=} \mathbf{NP}$$
We know $\mathbf{P} \subseteq \mathbf{NP}$ (any deterministic computation is a special case of nondeterministic computation). The question is whether the containment is proper.
If $\mathbf{P} = \mathbf{NP}$: every problem whose solution can be verified in polynomial time can also be solved in polynomial time. Consequences: current public-key cryptography collapses (factoring, discrete log are in $\mathbf{NP}$); mathematical proofs could be found automatically for any theorem with a polynomial verifier; protein folding and drug design become tractable; essentially all combinatorial optimization problems become easy.
If $\mathbf{P} \neq \mathbf{NP}$: there are problems that are easy to verify but genuinely hard to solve. This is the world virtually all researchers believe we live in.
Why Most Researchers Believe P $\neq$ NP
No proof exists either way, but the evidence for $\mathbf{P} \neq \mathbf{NP}$ is substantial:
Empirical. Thousands of NP-complete problems have been studied for over 50 years. Despite enormous effort, no polynomial algorithm has been found for any of them. In contrast, most problems known to be in $\mathbf{P}$ were found to have polynomial algorithms quickly once studied.
Algebraization barrier. Baker, Gill, and Solovay (1975) showed there exist oracles $A$ with $\mathbf{P}^A = \mathbf{NP}^A$ and oracles $B$ with $\mathbf{P}^B \neq \mathbf{NP}^B$. This means any proof technique that “relativizes” (works the same with oracle access) cannot resolve P vs. NP. Most classical techniques relativize, ruling them out.
Natural proofs barrier. Razborov and Rudich (1994) showed that a large class of proof techniques for circuit lower bounds (“natural proofs”) cannot prove superpolynomial lower bounds against circuits with pseudorandom properties - unless one-way functions do not exist. Ruling out a whole class of proof strategies.
The Cook-Levin Theorem
Theorem (Cook 1971, Levin 1973). SAT is NP-complete.
Proof sketch. SAT is clearly in NP. We must show every $L \in \mathbf{NP}$ reduces to SAT in polynomial time. Let $V$ be a polynomial-time verifier for $L$, running in time $p(n)$. On input $w$ of length $n$, we encode the entire computation of $V$ on $(w, c)$ as a Boolean formula $\phi_{w}$ over $O(p(n)^2)$ variables representing:
- The tableau: $\text{cell}[i][j][s]$ = “at time $i$, tape position $j$ contains symbol $s$”
- The state: $\text{state}[i][q]$ = “at time $i$, $V$ is in state $q$”
- The head: $\text{head}[i][j]$ = “at time $i$, the head is at position $j$”
Clauses enforce:
- The start configuration matches $w$.
- Each step respects $V$’s transition function.
- The final state is $q_\text{accept}$.
The formula $\phi_w$ is satisfiable iff some certificate $c$ makes $V(w,c)$ accept, iff $w \in L$. The formula has polynomial size and is computable in polynomial time. Since every NP language reduces to SAT, SAT is NP-hard.
Co-NP and the Polynomial Hierarchy
Co-NP is the class of complements of NP languages: $L \in \mathbf{co\text{-}NP}$ iff $\overline{L} \in \mathbf{NP}$.
If $\mathbf{P} = \mathbf{NP}$, then $\mathbf{NP} = \mathbf{co\text{-}NP}$ (since $\mathbf{P}$ is closed under complement). Conversely, if $\mathbf{NP} \neq \mathbf{co\text{-}NP}$, then $\mathbf{P} \neq \mathbf{NP}$.
$\mathbf{NP} \cap \mathbf{co\text{-}NP}$: Problems here have both short proofs of membership and short proofs of non-membership. Examples:
- Integer factorization. Given $n$ and $k$, does $n$ have a factor $\leq k$? Believed to be in $\mathbf{NP} \cap \mathbf{co\text{-}NP}$ but not in $\mathbf{P}$.
- Primality testing. AKS (2002) put this definitively in $\mathbf{P}$.
PSPACE contains both $\mathbf{NP}$ and $\mathbf{co\text{-}NP}$: $\mathbf{NP} \cup \mathbf{co\text{-}NP} \subseteq \mathbf{NP} \cap \mathbf{co\text{-}NP} \subseteq \mathbf{PSPACE}$. Problems complete for PSPACE include quantified Boolean formula (QBF) satisfiability.
Examples
Example 1: TSP (decision version). Given a complete weighted graph on $n$ cities and a budget $B$, does a tour visiting every city exactly once exist with total weight $\leq B$? This is in NP (certificate: the tour). It is NP-complete (proved via reduction from Hamiltonian path).
Example 2: Graph 3-coloring. Does a given graph have a proper 3-coloring? In NP (certificate: the coloring). NP-complete (proved via reduction from 3-SAT, see the next post).
Example 3: Subset sum. Given integers $a_1, \ldots, a_n$ and target $t$, does a subset sum to $t$? In NP (certificate: the subset). NP-complete.
Example 4: Why NP-hardness is useful information. If you are assigned to build a scheduler that minimizes conflicts (an NP-hard problem), knowing this tells you: do not look for a polynomial exact algorithm; instead use approximation algorithms, heuristics (simulated annealing, genetic algorithms), integer programming solvers, or exploit special structure (e.g., interval graph coloring is polynomial).
Read Next: NP-Completeness & Reductions