Prerequisite: P vs NP and the Hardest Problems

Why Approximation?

Many important optimisation problems - Travelling Salesman, Vertex Cover, Set Cover, Knapsack - are NP-hard. Assuming $P \neq NP$, no polynomial-time algorithm finds their exact optima. Approximation algorithms offer an escape: return a solution in polynomial time that is provably close to optimal. The proof of closeness is as important as the algorithm itself.

Definition. An algorithm $A$ is a $\rho$-approximation for a minimisation problem if, for every instance $I$,

$$\text{ALG}(I) \leq \rho \cdot \text{OPT}(I)$$

For maximisation, the inequality reverses: $\text{ALG}(I) \geq \frac{1}{\rho} \cdot \text{OPT}(I)$. The ratio $\rho \geq 1$ is called the approximation ratio or approximation factor. A ratio of $1$ means exact; the closer to $1$, the better.

Vertex Cover: A 2-Approximation

A vertex cover is a set $S$ of vertices such that every edge has at least one endpoint in $S$. Finding a minimum vertex cover is NP-hard.

Algorithm. Find a maximal matching $M$ (greedily add non-adjacent edges until none can be added). Return $S$ = both endpoints of every edge in $M$.

Analysis. Every edge in $M$ is covered because we include both its endpoints. Every other edge is also covered because $M$ is maximal - if an edge $(u,v)$ were uncovered, neither $u$ nor $v$ could be matched, contradicting maximality. So $S$ is a valid vertex cover. Its size is $2|M|$.

Now, any vertex cover must include at least one endpoint of every matching edge. Since the edges of $M$ are disjoint, $\text{OPT} \geq |M|$. Therefore:

$$|S| = 2|M| \leq 2 \cdot \text{OPT}$$

This gives a $2$-approximation. Running time is $O(V + E)$.

Set Cover: Greedy $\ln n$-Approximation

Given a universe $U$ of $n$ elements and a collection of sets $S_1, \ldots, S_m \subseteq U$, find the fewest sets that cover $U$. This generalises vertex cover.

Algorithm. Repeat: pick the set $S_i$ that covers the most currently uncovered elements. Stop when all elements are covered.

Analysis via potential function. Let $\text{OPT} = k$. At the moment we start, at least one of the $k$ optimal sets covers at least $n/k$ uncovered elements (pigeonhole). So after the first greedy pick, at most $n(1 - 1/k)$ elements remain. After $t$ greedy picks, at most $n(1-1/k)^t$ remain. When $t = k \ln n$, at most $n \cdot e^{-\ln n} = 1$ element remains, so $t = k \ln n$ sets suffice:

$$\text{ALG} \leq k \ln n = \text{OPT} \cdot \ln n$$

This $\ln n$-approximation is essentially tight: unless $P = NP$, no polynomial-time algorithm achieves $(1-\epsilon)\ln n$ for any $\epsilon > 0$.

Metric TSP: Christofides' 1.5-Approximation

In metric TSP, cities satisfy the triangle inequality $d(u,w) \leq d(u,v) + d(v,w)$. Find the shortest Hamiltonian cycle.

Christofides' algorithm (1976):

  1. Compute a minimum spanning tree $T$ of the complete graph.
  2. Let $O$ be the set of odd-degree vertices in $T$ (there are an even number of them).
  3. Find a minimum-weight perfect matching $M$ on $O$.
  4. Combine $T \cup M$ to form an Eulerian multigraph (all vertices now have even degree).
  5. Find an Euler tour. Shortcut repeated vertices (triangle inequality ensures shortcuts do not increase cost).

Analysis. The MST costs at most $\text{OPT}$ (removing any edge from the optimal tour yields a spanning tree). The matching $M$ costs at most $\text{OPT}/2$: the optimal tour restricted to the $|O|$ odd-degree vertices gives two perfect matchings; the cheaper one costs at most $\text{OPT}/2$. Total: $\text{ALG} \leq \text{OPT} + \text{OPT}/2 = 1.5 \cdot \text{OPT}$.

In 2022, Karlin, Klein, and Oveis Gharan broke the 1.5 barrier, achieving $(1.5 - 10^{-36})$, but Christofides remains the practical standard.

FPTAS for Knapsack

The 0/1 Knapsack problem is NP-hard but admits a Fully Polynomial-Time Approximation Scheme (FPTAS): for any $\epsilon > 0$, return a $(1+\epsilon)$-approximation in time polynomial in $n$ and $1/\epsilon$.

Idea. The exact DP runs in $O(nV)$ where $V = \sum v_i$. If values are large, rescale them. Let $K = \lfloor \epsilon \cdot v_{\max} / n \rfloor$. Round each value: $v_i' = \lfloor v_i / K \rfloor$. Run the exact DP on rounded values (cost $O(n^2/\epsilon)$). The rounding loss per item is at most $K$, so over $n$ items the total loss is at most $nK \leq \epsilon \cdot v_{\max} \leq \epsilon \cdot \text{OPT}$.

Inapproximability

Not every NP-hard problem has good approximations.

The PCP Theorem (Arora–Safra, 1992) implies that MAX-3SAT cannot be approximated beyond a ratio of $7/8 + \epsilon$ for any $\epsilon > 0$ unless $P = NP$. Since a random assignment already satisfies $7/8$ of clauses in expectation, this is tight.

Hardness of approximation is distinct from hardness of exact computation. Some problems are hard to approximate at all: given a vertex cover of size $k$, finding one of size $k \cdot n^{0.01}$ is also NP-hard. Others, like bin packing, admit an asymptotic PTAS but no PTAS for the absolute ratio.

Examples

$k$-Center. Given $n$ points in a metric and integer $k$, choose $k$ centres to minimise the maximum distance from any point to its nearest centre. The farthest-first heuristic - repeatedly pick the point farthest from all current centres - gives a $2$-approximation. No $(2-\epsilon)$-approximation exists unless $P = NP$.

Load Balancing. Given $m$ machines and $n$ jobs with processing times $p_j$, minimise the makespan (maximum machine load). The List Scheduling algorithm assigns each job to the currently least-loaded machine, yielding a $4/3$-approximation for $m \geq 2$. Longest Processing Time (LPT) first achieves $7/6$.

Facility Location. Open a subset of facilities (each with opening cost $f_i$) to serve clients (each assigned to an open facility at connection cost $d(j,i)$). A primal–dual algorithm achieves a $3$-approximation; the LP-rounding approach of Shmoys–Tardos–Aardal achieves $4$.


Read Next: Randomized Algorithms