P, NP, and Hardest Problems // Megha Bose

Computational Complexity

Undecidability established the absolute limits of computation: some problems have no algorithmic solution. Complexity theory asks a sharper question: among the problems that are decidable, which ones are efficiently solvable?

We measure efficiency by the number of steps a Turing machine takes as a function of input length. A TM $M$ runs in time $f(n)$ if on every input of length $n$, it halts within $f(n)$ steps. This gives us a complexity measure robust to the choice of reasonable computational model (multi-tape TMs, RAMs, etc.) - all are polynomially equivalent.

The Class P

$$\mathbf{P} = \bigcup_{k \geq 0} \text{TIME}(n^k)$$

$\mathbf{P}$ is the class of languages decidable in polynomial time on a deterministic TM. Polynomial time is our formal proxy for “efficient” computation for several reasons:

Polynomial time is closed under composition: running a polynomial algorithm as a subroutine a polynomial number of times still yields polynomial time.
Polynomial time is model-independent: the same class is captured by TMs, RAMs, and real hardware (with polynomial simulation overhead).
In practice, polynomial algorithms are almost always fast enough for large inputs, while exponential algorithms are almost always too slow.

Examples of problems in $\mathbf{P}$: shortest paths (Dijkstra, $O(m \log n)$), sorting ($O(n \log n)$), primality testing (AKS, $O(\log^6 n)$), linear programming (ellipsoid method, polynomial).

The Class NP

$$\mathbf{NP} = \bigcup_{k \geq 0} \text{NTIME}(n^k)$$

$\mathbf{NP}$ is the class of languages decidable in polynomial time on a nondeterministic TM. Equivalently (and more operationally useful):

Verifier definition. $L \in \mathbf{NP}$ iff there exists a polynomial-time TM $V$ (a verifier) and a polynomial $p$ such that: $$L = {w : \exists c \text{ with } |c| \leq p(|w|) \text{ and } V(w, c) \text{ accepts}}$$

The string $c$ is called a certificate or witness. $\mathbf{NP}$ is the class of problems where solutions are polynomially verifiable.

Examples:

SAT. $\phi \in$ SAT iff a satisfying assignment exists. Given the assignment, verification is linear time.
Hamiltonian path. Given graph $G$, does it contain a Hamiltonian path? Certificate: the path itself. Verification: check each vertex appears once and consecutive vertices are adjacent.
Graph $k$-coloring. Does $G$ have a proper $k$-coloring? Certificate: the coloring. Verification: check each edge has differently colored endpoints.
Clique. Does $G$ contain a clique of size $k$? Certificate: the clique. Verification: check all $\binom{k}{2}$ edges exist.

P vs. NP

The most celebrated open problem in mathematics and computer science:

$$\mathbf{P} \stackrel{?}{=} \mathbf{NP}$$

We know $\mathbf{P} \subseteq \mathbf{NP}$ (any deterministic computation is a special case of nondeterministic computation). The question is whether the containment is proper.

If $\mathbf{P} = \mathbf{NP}$: every problem whose solution can be verified in polynomial time can also be solved in polynomial time. Consequences: current public-key cryptography collapses (factoring, discrete log are in $\mathbf{NP}$); mathematical proofs could be found automatically for any theorem with a polynomial verifier; protein folding and drug design become tractable; essentially all combinatorial optimization problems become easy.

If $\mathbf{P} \neq \mathbf{NP}$: there are problems that are easy to verify but genuinely hard to solve. This is the world virtually all researchers believe we live in.

Why Most Researchers Believe P $\neq$ NP

No proof exists either way, but the evidence for $\mathbf{P} \neq \mathbf{NP}$ is substantial:

Empirical. Thousands of NP-complete problems have been studied for over 50 years. Despite enormous effort, no polynomial algorithm has been found for any of them. In contrast, most problems known to be in $\mathbf{P}$ were found to have polynomial algorithms quickly once studied.

Algebraization barrier. Baker, Gill, and Solovay (1975) showed there exist oracles $A$ with $\mathbf{P}^A = \mathbf{NP}^A$ and oracles $B$ with $\mathbf{P}^B \neq \mathbf{NP}^B$. This means any proof technique that “relativizes” (works the same with oracle access) cannot resolve P vs. NP. Most classical techniques relativize, ruling them out.

Natural proofs barrier. Razborov and Rudich (1994) showed that a large class of proof techniques for circuit lower bounds (“natural proofs”) cannot prove superpolynomial lower bounds against circuits with pseudorandom properties - unless one-way functions do not exist. Ruling out a whole class of proof strategies.

The Cook-Levin Theorem

Theorem (Cook 1971, Levin 1973). SAT is NP-complete.

Proof sketch. SAT is clearly in NP. We must show every $L \in \mathbf{NP}$ reduces to SAT in polynomial time. Let $V$ be a polynomial-time verifier for $L$, running in time $p(n)$. On input $w$ of length $n$, we encode the entire computation of $V$ on $(w, c)$ as a Boolean formula $\phi_{w}$ over $O(p(n)^2)$ variables representing:

The tableau: $\text{cell}[i][j][s]$ = “at time $i$, tape position $j$ contains symbol $s$”
The state: $\text{state}[i][q]$ = “at time $i$, $V$ is in state $q$”
The head: $\text{head}[i][j]$ = “at time $i$, the head is at position $j$”

Clauses enforce:

The start configuration matches $w$.
Each step respects $V$’s transition function.
The final state is $q_\text{accept}$.

The formula $\phi_w$ is satisfiable iff some certificate $c$ makes $V(w,c)$ accept, iff $w \in L$. The formula has polynomial size and is computable in polynomial time. Since every NP language reduces to SAT, SAT is NP-hard.

Co-NP and the Polynomial Hierarchy

Co-NP is the class of complements of NP languages: $L \in \mathbf{co\text{-}NP}$ iff $\overline{L} \in \mathbf{NP}$.

If $\mathbf{P} = \mathbf{NP}$, then $\mathbf{NP} = \mathbf{co\text{-}NP}$ (since $\mathbf{P}$ is closed under complement). Conversely, if $\mathbf{NP} \neq \mathbf{co\text{-}NP}$, then $\mathbf{P} \neq \mathbf{NP}$.

$\mathbf{NP} \cap \mathbf{co\text{-}NP}$: Problems here have both short proofs of membership and short proofs of non-membership. Examples:

Integer factorization. Given $n$ and $k$, does $n$ have a factor $\leq k$? Believed to be in $\mathbf{NP} \cap \mathbf{co\text{-}NP}$ but not in $\mathbf{P}$.
Primality testing. AKS (2002) put this definitively in $\mathbf{P}$.

PSPACE contains both $\mathbf{NP}$ and $\mathbf{co\text{-}NP}$: $\mathbf{NP} \cup \mathbf{co\text{-}NP} \subseteq \mathbf{NP} \cap \mathbf{co\text{-}NP} \subseteq \mathbf{PSPACE}$. Problems complete for PSPACE include quantified Boolean formula (QBF) satisfiability.

Examples

Example 1: TSP (decision version). Given a complete weighted graph on $n$ cities and a budget $B$, does a tour visiting every city exactly once exist with total weight $\leq B$? This is in NP (certificate: the tour). It is NP-complete (proved via reduction from Hamiltonian path).

Example 2: Graph 3-coloring. Does a given graph have a proper 3-coloring? In NP (certificate: the coloring). NP-complete (proved via reduction from 3-SAT, see the next post).

Example 3: Subset sum. Given integers $a_1, \ldots, a_n$ and target $t$, does a subset sum to $t$? In NP (certificate: the subset). NP-complete.

Example 4: Why NP-hardness is useful information. If you are assigned to build a scheduler that minimizes conflicts (an NP-hard problem), knowing this tells you: do not look for a polynomial exact algorithm; instead use approximation algorithms, heuristics (simulated annealing, genetic algorithms), integer programming solvers, or exploit special structure (e.g., interval graph coloring is polynomial).

Read Next: NP-Completeness & Reductions