Local Search & Hill Climbing // Megha Bose

The Local Search Paradigm

Exact methods for NP-hard problems take exponential time in the worst case. Local search offers a different trade-off: start with any feasible solution, then repeatedly move to a neighbouring solution that is better. Iteration continues until no improving neighbour exists - a local optimum. There is no guarantee of reaching the global optimum, but local search is fast, simple, and often produces solutions of excellent practical quality.

The algorithm template is:

Generate an initial solution $s_0$.
Define a neighbourhood $N(s)$: the set of solutions reachable from $s$ by a single move.
Repeat: if some $s' \in N(s)$ has a better objective value, move to $s'$. Stop when $s$ is locally optimal.

Two search strategies within this template:

Steepest ascent (best improvement): evaluate all neighbours, move to the best one. Fewer iterations, but each iteration is expensive.
First ascent (first improvement): move to the first improving neighbour found. Cheaper iterations, more of them, similar solution quality in practice.

Landscape Geometry: Plateaus and Ridges

The search space can be visualised as a fitness landscape over solutions. Local optima are peaks; the global optimum is the highest peak. Problems arise when the landscape has:

Plateaus: large flat regions where no neighbour is strictly better, yet the region is not globally optimal.
Ridges: sequences of improving solutions reachable only by moving through valleys - the neighbourhood structure must be rich enough to traverse them.

Random restarts address the problem of being trapped in a poor local optimum: run hill climbing from many random starting points, keep the best result found. If a local optimum is found with probability $p$ per run, then after $t$ restarts the probability of missing the global optimum is $(1-p)^t$.

2-Opt for the Travelling Salesman Problem

2-opt is the canonical local search algorithm for TSP. A tour visits $n$ cities; a 2-opt move removes two edges $(a,b)$ and $(c,d)$ and reconnects the tour with edges $(a,c)$ and $(b,d)$, reversing the segment between $b$ and $c$.

There are $\binom{n}{2} = O(n^2)$ candidate 2-opt swaps per step. The gain from removing edges $(a,b), (c,d)$ and inserting $(a,c), (b,d)$ is:

$$\Delta = d(a,b) + d(c,d) - d(a,c) - d(b,d)$$

Accept the move if $\Delta > 0$. Repeat until no improving swap exists. The result is a 2-opt local optimum: no single edge swap can improve the tour.

In practice, 2-opt finds tours within 2–5% of optimal on Euclidean instances. 3-opt considers removing three edges and reconnecting in any valid way ($O(n^3)$ neighbours), finding better optima at higher cost. $k$-opt and the variable-depth Lin-Kernighan heuristic generalise this further. Lin-Kernighan held the practical record on TSP benchmarks for decades.

Local Search for MAX-SAT

MAX-SAT asks for an assignment satisfying the maximum number of clauses. A simple local search: start with a random assignment; repeatedly flip a single Boolean variable if doing so satisfies more clauses. Terminate at a local optimum.

WALKSAT adds randomisation to escape local optima:

Pick an unsatisfied clause uniformly at random.
With probability $p$, flip a random variable in the clause. With probability $1-p$, flip the variable whose flip satisfies the most clauses (locally greedy).

The random walk term ( $p$ component) allows crossing plateaus while the greedy term makes progress. WALKSAT is one of the most effective heuristics for satisfiability in practice.

The PLS Complexity Class

Polynomial Local Search (PLS) formalises the complexity of finding local optima. A PLS problem specifies:

A set of feasible solutions with polynomial-size encoding.
A polynomial-time algorithm to compute $N(s)$ and evaluate $f(s)$.
A polynomial-time algorithm to find an initial solution.

A problem is PLS-complete if every PLS problem reduces to it. Circuit Flip (local optimum of flipping gates in a circuit) is PLS-complete. Many combinatorial local search problems - including local Nash equilibria and 2-opt TSP local optima - are PLS-complete, meaning finding local optima is itself computationally hard in general, even without requiring the global optimum.

Quality of Local Optima

For many problems, any local optimum approximates the global optimum within a constant factor. Specifically:

A 2-opt optimal TSP tour on a Euclidean instance satisfies $\text{ALG} \leq O(\log n) \cdot \text{OPT}$ and is empirically within a few percent.
A local optimum for MAX-CUT under single-flip neighbourhoods satisfies $\text{ALG} \geq \text{OPT}/2$: if any vertex has more neighbours on the same side, flip it; when no such vertex exists, at least half of all edges cross the cut by a counting argument.

These guarantees make local search attractive even when no global optimality is required.

Examples

Scheduling. Given $m$ machines and $n$ jobs, minimise makespan. A neighbourhood move: reassign one job from the most-loaded machine to the least-loaded. Local search converges quickly and produces solutions within a few percent of optimal on random instances.

Graph Partitioning (Kernighan-Lin). Partition vertices of a graph into two equal halves minimising the number of crossing edges. The Kernighan-Lin algorithm swaps pairs of vertices between partitions, locking each swapped vertex to prevent cycling. It selects the prefix of swaps yielding maximum gain. This structured neighbourhood search dramatically outperforms pure greedy partitioning.

Neural Network Training. Gradient descent on a loss surface is hill climbing (descent) in a continuous, high-dimensional space. Mini-batch SGD introduces stochasticity analogous to random perturbations; momentum and adaptive learning rates (Adam, RMSProp) are practical tools for navigating complex loss landscapes. The loss surface of modern deep networks has many saddle points but surprisingly few poor local minima.

Read Next: Simulated Annealing