Local Search - Improving One Step at a Time Until You Can't // Megha Bose

Helpful context:

Approximation Algorithms - Close Enough, Fast Enough

Evolution doesn’t plan ahead. Each mutation is accepted if it improves fitness; rejected if it doesn’t. A million iterations of “take a small step in a good direction” produced the eye, the heart, the brain. Local search is this idea applied to optimization: start somewhere, repeatedly move to a better neighbor, stop when no better neighbor exists.

It sounds almost too simple. And yet, tuned carefully, it solves some of the hardest combinatorial problems we know of - often to within a percent or two of optimal.

The Local Search Template

Every local search algorithm has the same skeleton:

Start with any feasible solution $s$.
Define a neighborhood $N(s)$: the set of solutions reachable from $s$ by a single move.
Iterate: if some $s' \in N(s)$ is better than $s$, move to $s'$.
Stop when $s$ is locally optimal - no neighbor is better.

The three choices that determine everything:

How you define the neighborhood (what counts as a “move”)
Whether you pick the best improving neighbor (steepest ascent) or the first improving neighbor you find (first ascent)
How you initialize

Steepest ascent does fewer iterations but each iteration is expensive - you scan all of $N(s)$. First ascent is cheaper per step. In practice, first ascent often finds solutions of comparable quality in less total time.

2-Opt for the Traveling Salesman Problem

The textbook example is the 2-opt algorithm for TSP.

You have a tour visiting $n$ cities. A 2-opt move removes two edges $(a, b)$ and $(c, d)$ from the tour and reconnects the two resulting paths in the only other valid way, inserting edges $(a, c)$ and $(b, d)$. This reverses the segment of the tour between $b$ and $c$.

The gain from a 2-opt move is:

$$\Delta = d(a, b) + d(c, d) - d(a, c) - d(b, d).$$

Accept the move if $\Delta > 0$ (the tour gets shorter). There are $O(n^2)$ candidate 2-opt swaps at each step. Repeat until no improving swap exists - you’ve reached a 2-opt local optimum.

In practice on Euclidean instances, 2-opt finds tours within 2 - 5% of optimal. Not bad for an algorithm that doesn’t know anything about the global structure.

3-opt considers removing 3 edges and reconnecting in any of the valid ways - $O(n^3)$ neighbors per step, better quality, more expensive. The Lin-Kernighan algorithm takes this further with variable-depth moves and clever candidate lists. For decades, Lin-Kernighan held the practical record on TSP benchmarks.

k-Opt Improvements

The pattern generalizes. Larger $k$ means:

A bigger neighborhood (more moves considered)
Better local optima (fewer local traps, since you can escape by making several simultaneous changes)
More expensive per step ($O(n^k)$ moves to evaluate)

There’s a fundamental tradeoff: the richer the neighborhood, the better the local optima you find, but the more expensive each iteration becomes. For TSP, 2-opt is the sweet spot for speed, 3-opt often improves quality meaningfully, and beyond that the diminishing returns usually don’t justify the cost.

The Fundamental Problem: Local Optima and Plateaus

Hill climbing always terminates at a local optimum. Whether that local optimum is good depends on the fitness landscape - the geometry of the objective function over the space of solutions.

A good fitness landscape for local search has:

Many local optima, but most are near-optimal
Large basins of attraction for good optima (most starting points lead somewhere good)

A bad fitness landscape has:

A few good optima surrounded by vast flat regions (plateaus) where no neighbor is strictly better
Ridges: sequences of equal-quality solutions where you can only improve by going “sideways” through the neighborhood - but the neighborhood doesn’t support sideways moves

On a plateau, hill climbing stalls: no neighbor is better, but you’re not at a global optimum either. The algorithm has no way to distinguish “this is optimal” from “this is a flat region I’m stuck in.”

Random Restarts

The simplest fix for local optima: run the algorithm many times from different random starting points, and keep the best result.

If a single run finds the global optimum with probability $p$, then after $t$ independent runs you miss it with probability $(1 - p)^t$, which decays exponentially. With $t = 1/p$ runs, you find the global optimum with probability at least $1 - 1/e \approx 63%$.

Random restarts work well when:

Individual runs are cheap
Good solutions are reached from many starting points (large basin of attraction)
The search space has many equally good global optima

They struggle when:

Good starting points are rare
Runs are expensive
The global optimum has a tiny basin of attraction

Perturbation: Iterated Local Search

A more sophisticated approach is iterated local search (ILS):

Run local search from an initial solution to a local optimum $s^*$.
Perturb $s^*$ - apply a random change that’s larger than a single local move, disrupting the local optimum enough to escape.
Run local search again from the perturbed solution.
Accept or reject the new local optimum based on some criterion (the new local optimum is better, or the best seen so far, etc.).
Repeat.

The key is calibrating the perturbation. Too small, and local search returns to the same local optimum. Too large, and you’re essentially starting from scratch - equivalent to a random restart.

ILS for TSP: apply a random “double-bridge” move (a specific 4-opt move that can’t be undone by any sequence of 2-opt moves), then re-run 2-opt. This produces dramatically better results than either random restarts or 2-opt alone.

Discomfort check. With random restarts, aren’t we just doing random search plus post-processing?

No. Local search quickly exploits nearby structure - it’s like gradient descent without explicit gradients. Random restarts add global exploration. The combination is qualitatively different from either: local search in a good region converges quickly to a high-quality local optimum, while random restarts ensure you’ve sampled from different regions of the space. The information in each local search run (which local optimum it reached, what structure that optimum has) can guide where to restart. Pure random search has no such structure.

Quality of Local Optima

For some problems, any local optimum approximates the global optimum:

MAX-CUT. Assign each vertex to one side of a cut. Neighborhood move: flip one vertex to the other side. A local optimum for this neighborhood satisfies: at least half of all edges cross the cut. Proof: if any vertex had more neighbors on the same side as itself, flipping it would improve the cut - so at a local optimum, every vertex has at least as many neighbors on the opposite side, meaning at least half of all edges cross. This is a 2-approximation to the MAX-CUT optimum.

2-opt TSP. Any 2-opt local optimum on a Euclidean instance satisfies $\text{ALG} \leq O(\log n) \cdot \text{OPT}$ in the worst case, and is typically within a few percent of optimal empirically. The $O(\log n)$ bound is somewhat pessimistic.

Applications

VLSI placement. Assign circuit components to chip locations to minimize wire length. Neighborhood: swap two component positions. Local search converges quickly and produces competitive placements.

Graph partitioning: Kernighan-Lin. Partition vertices into two equal halves minimizing crossing edges. KL swaps pairs of vertices between partitions, “locking” each swapped vertex, and picks the swap sequence giving maximum total gain. This structured neighborhood dramatically outperforms greedy partitioning.

Scheduling. Reassign one job from the most-loaded machine to the least-loaded. A few iterations of this produces solutions close to optimal for random instances.

Protein folding. Local search on fragment insertion moves (replacing local backbone segments with library fragments) is a core component of the Rosetta protein structure prediction suite.

Summary

Neighborhood	Size	Cost per step	Solution quality	Use when
1-opt (swap one element)	$O(n)$	Cheap	Low	Quick exploration
2-opt (swap two edges/elements)	$O(n^2)$	Moderate	Good	TSP, scheduling
3-opt	$O(n^3)$	Expensive	Better	When quality matters
Lin-Kernighan	Variable	Variable	Excellent	TSP benchmarks
Random restarts	-	Multiplied runs	Better global	Cheap runs, large basins
ILS (perturbation)	-	Moderate overhead	Best practice	Expensive runs, good basins

Local search is the algorithmic version of trial and error, made fast by focusing effort on promising regions. The neighborhood structure encodes your beliefs about what changes to a solution are likely to help. The larger the neighborhood, the more ground you cover per step - but the more expensive each step becomes. The art is choosing the right neighborhood for your problem and combining local exploitation with enough global randomness to escape local traps.

Read next:

Simulated Annealing - Escaping Local Optima by Accepting Bad Moves