Approximation Algorithms
Prerequisite: P vs NP and the Hardest Problems
Why Approximation?
Many important optimisation problems - Travelling Salesman, Vertex Cover, Set Cover, Knapsack - are NP-hard. Assuming $P \neq NP$, no polynomial-time algorithm finds their exact optima. Approximation algorithms offer an escape: return a solution in polynomial time that is provably close to optimal. The proof of closeness is as important as the algorithm itself.
Definition. An algorithm $A$ is a $\rho$-approximation for a minimisation problem if, for every instance $I$,
$$\text{ALG}(I) \leq \rho \cdot \text{OPT}(I)$$
For maximisation, the inequality reverses: $\text{ALG}(I) \geq \frac{1}{\rho} \cdot \text{OPT}(I)$. The ratio $\rho \geq 1$ is called the approximation ratio or approximation factor. A ratio of $1$ means exact; the closer to $1$, the better.
Vertex Cover: A 2-Approximation
A vertex cover is a set $S$ of vertices such that every edge has at least one endpoint in $S$. Finding a minimum vertex cover is NP-hard.
Algorithm. Find a maximal matching $M$ (greedily add non-adjacent edges until none can be added). Return $S$ = both endpoints of every edge in $M$.
Analysis. Every edge in $M$ is covered because we include both its endpoints. Every other edge is also covered because $M$ is maximal - if an edge $(u,v)$ were uncovered, neither $u$ nor $v$ could be matched, contradicting maximality. So $S$ is a valid vertex cover. Its size is $2|M|$.
Now, any vertex cover must include at least one endpoint of every matching edge. Since the edges of $M$ are disjoint, $\text{OPT} \geq |M|$. Therefore:
$$|S| = 2|M| \leq 2 \cdot \text{OPT}$$
This gives a $2$-approximation. Running time is $O(V + E)$.
Set Cover: Greedy $\ln n$-Approximation
Given a universe $U$ of $n$ elements and a collection of sets $S_1, \ldots, S_m \subseteq U$, find the fewest sets that cover $U$. This generalises vertex cover.
Algorithm. Repeat: pick the set $S_i$ that covers the most currently uncovered elements. Stop when all elements are covered.
Analysis via potential function. Let $\text{OPT} = k$. At the moment we start, at least one of the $k$ optimal sets covers at least $n/k$ uncovered elements (pigeonhole). So after the first greedy pick, at most $n(1 - 1/k)$ elements remain. After $t$ greedy picks, at most $n(1-1/k)^t$ remain. When $t = k \ln n$, at most $n \cdot e^{-\ln n} = 1$ element remains, so $t = k \ln n$ sets suffice:
$$\text{ALG} \leq k \ln n = \text{OPT} \cdot \ln n$$
This $\ln n$-approximation is essentially tight: unless $P = NP$, no polynomial-time algorithm achieves $(1-\epsilon)\ln n$ for any $\epsilon > 0$.
Metric TSP: Christofides' 1.5-Approximation
In metric TSP, cities satisfy the triangle inequality $d(u,w) \leq d(u,v) + d(v,w)$. Find the shortest Hamiltonian cycle.
Christofides' algorithm (1976):
- Compute a minimum spanning tree $T$ of the complete graph.
- Let $O$ be the set of odd-degree vertices in $T$ (there are an even number of them).
- Find a minimum-weight perfect matching $M$ on $O$.
- Combine $T \cup M$ to form an Eulerian multigraph (all vertices now have even degree).
- Find an Euler tour. Shortcut repeated vertices (triangle inequality ensures shortcuts do not increase cost).
Analysis. The MST costs at most $\text{OPT}$ (removing any edge from the optimal tour yields a spanning tree). The matching $M$ costs at most $\text{OPT}/2$: the optimal tour restricted to the $|O|$ odd-degree vertices gives two perfect matchings; the cheaper one costs at most $\text{OPT}/2$. Total: $\text{ALG} \leq \text{OPT} + \text{OPT}/2 = 1.5 \cdot \text{OPT}$.
In 2022, Karlin, Klein, and Oveis Gharan broke the 1.5 barrier, achieving $(1.5 - 10^{-36})$, but Christofides remains the practical standard.
FPTAS for Knapsack
The 0/1 Knapsack problem is NP-hard but admits a Fully Polynomial-Time Approximation Scheme (FPTAS): for any $\epsilon > 0$, return a $(1+\epsilon)$-approximation in time polynomial in $n$ and $1/\epsilon$.
Idea. The exact DP runs in $O(nV)$ where $V = \sum v_i$. If values are large, rescale them. Let $K = \lfloor \epsilon \cdot v_{\max} / n \rfloor$. Round each value: $v_i' = \lfloor v_i / K \rfloor$. Run the exact DP on rounded values (cost $O(n^2/\epsilon)$). The rounding loss per item is at most $K$, so over $n$ items the total loss is at most $nK \leq \epsilon \cdot v_{\max} \leq \epsilon \cdot \text{OPT}$.
Inapproximability
Not every NP-hard problem has good approximations.
The PCP Theorem (Arora–Safra, 1992) implies that MAX-3SAT cannot be approximated beyond a ratio of $7/8 + \epsilon$ for any $\epsilon > 0$ unless $P = NP$. Since a random assignment already satisfies $7/8$ of clauses in expectation, this is tight.
Hardness of approximation is distinct from hardness of exact computation. Some problems are hard to approximate at all: given a vertex cover of size $k$, finding one of size $k \cdot n^{0.01}$ is also NP-hard. Others, like bin packing, admit an asymptotic PTAS but no PTAS for the absolute ratio.
Examples
$k$-Center. Given $n$ points in a metric and integer $k$, choose $k$ centres to minimise the maximum distance from any point to its nearest centre. The farthest-first heuristic - repeatedly pick the point farthest from all current centres - gives a $2$-approximation. No $(2-\epsilon)$-approximation exists unless $P = NP$.
Load Balancing. Given $m$ machines and $n$ jobs with processing times $p_j$, minimise the makespan (maximum machine load). The List Scheduling algorithm assigns each job to the currently least-loaded machine, yielding a $4/3$-approximation for $m \geq 2$. Longest Processing Time (LPT) first achieves $7/6$.
Facility Location. Open a subset of facilities (each with opening cost $f_i$) to serve clients (each assigned to an open facility at connection cost $d(j,i)$). A primal–dual algorithm achieves a $3$-approximation; the LP-rounding approach of Shmoys–Tardos–Aardal achieves $4$.
Read Next: Randomized Algorithms