Prerequisite: P, NP, and Hardest Problems

NP-Hardness and NP-Completeness

The Cook-Levin theorem established SAT as a problem at least as hard as every problem in NP. This motivates precise definitions.

A language $L$ is NP-hard if every language in $\mathbf{NP}$ is polynomial-time many-one reducible to $L$:

$$\forall A \in \mathbf{NP},\quad A \leq_p L$$

A language $L$ is NP-complete if it is NP-hard and $L \in \mathbf{NP}$.

NP-complete problems are the hardest problems in NP: if any NP-complete problem is in $\mathbf{P}$, then $\mathbf{P} = \mathbf{NP}$.

Polynomial-Time Many-One Reductions

A polynomial-time many-one reduction from $A$ to $B$, written $A \leq_p B$, is a function $f: \Sigma^\ast \to \Sigma^\ast$ that is:

  • Computable by a polynomial-time TM
  • Satisfies: $x \in A \iff f(x) \in B$

Reductions are transitive: if $A \leq_p B$ and $B \leq_p C$, then $A \leq_p C$. This means once we know $B$ is NP-hard, we can establish $C$ is NP-hard by reducing $B$ to $C$ - we do not need to reduce every NP problem to $C$ directly.

Key facts:

  • If $A \leq_p B$ and $B \in \mathbf{P}$, then $A \in \mathbf{P}$.
  • If $A \leq_p B$ and $A$ is NP-hard, then $B$ is NP-hard.
  • To prove $L$ is NP-complete: (1) show $L \in \mathbf{NP}$, (2) show a known NP-complete problem $\leq_p L$.

The Reduction Chain

The canonical chain of NP-completeness results is:

$$\text{SAT} \leq_p \text{3-SAT} \leq_p \text{Independent Set} \leq_p \text{Vertex Cover} \leq_p \text{Clique}$$

We trace the key reductions.

SAT $\leq_p$ 3-SAT

A Boolean formula in Conjunctive Normal Form (CNF) is a conjunction of clauses, each a disjunction of literals. 3-SAT restricts each clause to exactly 3 literals.

Given an arbitrary CNF formula $\phi$, convert each clause to 3-literal clauses:

  • A clause $(l_1)$ becomes $(l_1 \vee z_1 \vee z_2) \wedge (l_1 \vee z_1 \vee \overline{z_2}) \wedge (l_1 \vee \overline{z_1} \vee z_2) \wedge (l_1 \vee \overline{z_1} \vee \overline{z_2})$ with fresh variables $z_1, z_2$. This is satisfiable iff $l_1$ is true.
  • A clause $(l_1 \vee l_2)$ becomes $(l_1 \vee l_2 \vee z) \wedge (l_1 \vee l_2 \vee \overline{z})$ with a fresh variable $z$.
  • A clause of 4+ literals $(l_1 \vee \cdots \vee l_k)$ is split as $(l_1 \vee l_2 \vee z_1) \wedge (\overline{z_1} \vee l_3 \vee z_2) \wedge \cdots \wedge (\overline{z_{k-3}} \vee l_{k-1} \vee l_k)$ using $k-3$ fresh variables.

The resulting formula is satisfiable iff $\phi$ is satisfiable.

3-SAT $\leq_p$ Independent Set

An independent set in graph $G = (V, E)$ is a set $S \subseteq V$ with no edge between any two vertices in $S$. The INDEPENDENT-SET problem: does $G$ have an independent set of size $k$?

Given a 3-CNF formula $\phi$ with $m$ clauses, construct a graph $G = (V, E)$:

  • Clause gadgets. For each clause $(l_1 \vee l_2 \vee l_3)$, add three nodes labeled $l_1, l_2, l_3$ and connect them in a triangle.
  • Conflict edges. For each pair of nodes labeled with complementary literals (e.g., $x$ and $\overline{x}$) in different clause gadgets, add an edge.

Set $k = m$ (number of clauses).

Claim. $\phi$ is satisfiable iff $G$ has an independent set of size $m$.

Proof. ($\Rightarrow$) Given a satisfying assignment, each clause has at least one true literal. Pick one true literal per clause; the corresponding $m$ nodes form an independent set. No two chosen nodes from the same clause are both chosen (we pick one per clause). No two chosen nodes are connected by a conflict edge (they have consistent truth values).

($\Leftarrow$) An independent set $S$ of size $m$ must contain exactly one node per clause gadget (since the three nodes in each triangle cannot all be in an independent set, and we need $m$ nodes from $m$ triangles). The assignment that sets each selected literal to true is consistent (no conflicts, since $S$ avoids conflict edges) and satisfies every clause.

This reduction runs in polynomial time ($O(m^2)$ for the conflict edges).

Independent Set $\leq_p$ Vertex Cover

A vertex cover of $G = (V, E)$ is a set $S \subseteq V$ such that every edge has at least one endpoint in $S$.

Complement theorem. $S$ is an independent set in $G$ iff $V \setminus S$ is a vertex cover.

Proof. $S$ is independent iff no edge has both endpoints in $S$ iff every edge has at least one endpoint in $V \setminus S$ iff $V \setminus S$ is a vertex cover.

Therefore: $G$ has an independent set of size $k$ iff $G$ has a vertex cover of size $|V| - k$. The reduction is just parameter arithmetic - trivially polynomial.

Independent Set $\leq_p$ Clique

A clique in $G = (V, E)$ is a set $S \subseteq V$ where every pair in $S$ is connected by an edge. CLIQUE: does $G$ have a clique of size $k$?

Let $\bar{G}$ be the complement graph of $G$ (same vertices, edges wherever $G$ has non-edges). Then $S$ is an independent set in $G$ iff $S$ is a clique in $\bar{G}$. So $G$ has an independent set of size $k$ iff $\bar{G}$ has a clique of size $k$.

3-SAT $\leq_p$ 3-Colorability

A proper $k$-coloring of $G$ assigns colors ${1, \ldots, k}$ to vertices so no edge connects same-colored vertices. 3-COLORABILITY: does $G$ have a proper 3-coloring?

Given a 3-CNF formula $\phi$, construct graph $G$:

Palette triangle. Add three nodes: TRUE, FALSE, BASE. Connect all three in a triangle (forcing them to receive distinct colors).

Variable gadgets. For each variable $x_i$, add nodes $x_i$ and $\overline{x_i}$, connected to each other and to BASE. This forces $x_i$ and $\overline{x_i}$ to get colors TRUE and FALSE (in some order), since they are adjacent (different colors) and both adjacent to BASE (neither is colored BASE).

Clause gadgets. For each clause $(l_1 \vee l_2 \vee l_3)$, add a small gadget (an OR-gadget) that is 3-colorable iff at least one of $l_1, l_2, l_3$ is colored TRUE. This gadget uses 5 additional nodes and 7 edges.

The full graph is 3-colorable iff $\phi$ is satisfiable.

Knapsack and Subset Sum

SUBSET-SUM. Given integers $a_1, \ldots, a_n$ and target $t$, does a subset sum to $t$?

Theorem. SUBSET-SUM is NP-complete.

Proof: reduce 3-SAT to SUBSET-SUM. Encode the formula’s variables and clauses as carefully chosen integers so that a subset summing to a specific target corresponds exactly to a satisfying assignment.

KNAPSACK (decision). Given item weights $w_i$, values $v_i$, capacity $W$, and threshold $V$: does a subset of items fit in the knapsack (total weight $\leq W$) with total value $\geq V$? NP-complete via reduction from SUBSET-SUM.

Importantly, SUBSET-SUM and KNAPSACK have pseudo-polynomial algorithms: dynamic programming in $O(nW)$ time. Since $W$ may be exponential in its input size (number of bits), this is not polynomial time - but for bounded $W$, it is practical. This makes them weakly NP-hard. TSP is strongly NP-hard: no pseudo-polynomial algorithm exists unless $\mathbf{P} = \mathbf{NP}$.

Strong vs. Weak NP-Hardness

A problem is strongly NP-hard if it remains NP-hard even when all numerical values in the input are bounded by a polynomial in the input length. A problem is weakly NP-hard if it has a pseudo-polynomial algorithm (polynomial in the input length and the magnitude of numerical values).

  • Weakly NP-hard: SUBSET-SUM, KNAPSACK, PARTITION
  • Strongly NP-hard: TSP, 3-SAT, graph coloring, clique

This distinction matters: weakly NP-hard problems are often tractable in practice (small numerical values), while strongly NP-hard problems require approximation or heuristics.

Proving New Problems NP-Complete

Strategy. To show problem $\Pi$ is NP-complete:

  1. Show $\Pi \in \mathbf{NP}$: describe a polynomial verifier.
  2. Choose a known NP-complete problem $\Lambda$ and exhibit a polynomial-time reduction $f$ from $\Lambda$ to $\Pi$.
  3. Prove correctness: $x \in \Lambda \iff f(x) \in \Pi$.

Choosing the right problem to reduce from is an art. Common sources:

  • 3-SAT: flexible, works for combinatorial problems involving choices.
  • INDEPENDENT-SET / CLIQUE: works for graph problems involving selection.
  • 3-COLORABILITY: works for assignment/scheduling problems with conflicts.
  • SUBSET-SUM: works for numerical partitioning problems.

Examples

Example 1: Hamiltonian Path is NP-complete. HAMILTONIAN-PATH $\in$ NP (certificate: the path). Reduce from 3-SAT using a complex gadget construction with variable gadgets (paths that can be traversed in two directions) and clause gadgets. This is a textbook reduction in Sipser or Garey-Johnson.

Example 2: What NP-completeness means in practice. When a combinatorial optimization problem is NP-hard, practitioners switch strategies: (a) approximation algorithms (e.g., 2-approximation for vertex cover: take both endpoints of any unmatched edge); (b) exact exponential algorithms with good constants (branch-and-bound); (c) integer programming solvers (CPLEX, Gurobi) which work well on many instances despite worst-case exponential time; (d) parameterized algorithms that are polynomial when a structural parameter (treewidth, solution size) is small.

Example 3: TSP is NP-complete. The decision version (is there a tour of cost $\leq B$?) is in NP. NP-hardness follows from Hamiltonian path: given a graph $G$, construct a complete weighted graph where edges in $G$ have weight 0 and missing edges have weight 1. A Hamiltonian path exists in $G$ iff a TSP tour of weight $\leq |V|$ exists. (Adjust for cycles.)


Read Next: Space Complexity