Genetic Algorithms & Evolutionary Computation
Prerequisite: Local Search & Hill Climbing
Evolutionary Computation
Where hill climbing moves a single solution through the search space, evolutionary algorithms maintain a population of candidate solutions and improve it over generations through mechanisms inspired by natural selection. The key idea: good solutions share structural features that contribute to their fitness; combining those features via crossover can produce offspring that are even better.
Genetic algorithms (GAs), introduced by John Holland in the 1970s, are the most widely studied instance. The population evolves via selection, crossover (recombination), and mutation. Each component has a precise role: selection provides pressure toward high-fitness regions; crossover combines promising building blocks; mutation maintains diversity and prevents premature convergence.
Representation
The choice of representation shapes every other component of the algorithm.
Binary strings. Each solution is a bitstring $x \in {0,1}^n$. Classic for theoretical analysis and combinatorial problems where features can be encoded as binary decisions.
Real-valued vectors. Each solution is $x \in \mathbb{R}^n$. Natural for continuous optimisation (e.g., neural network weights, engineering parameters). Used in Evolution Strategies (ES).
Permutations. Each solution is a permutation of ${1, \ldots, n}$. Required for routing, scheduling, and TSP, where standard crossover must be adapted to preserve validity.
Fitness and Selection
The fitness function $f(x) \geq 0$ assigns a score to each solution. Selection chooses parents for reproduction with probability proportional to their fitness.
Roulette wheel selection (fitness-proportionate): Individual $i$ is selected with probability
$$p_i = \frac{f(x_i)}{\sum_{j=1}^{\mu} f(x_j)}$$
This amplifies high-fitness individuals but is sensitive to scaling: if one individual has fitness $1000\times$ others, it dominates and diversity collapses.
Tournament selection: Choose $k$ individuals uniformly at random; the winner (highest fitness) is selected as a parent. Repeat for the second parent. Tournament size $k$ controls selection pressure: $k=2$ is mild, $k=7$ is strong. Tournament selection is robust to scaling and is the standard in practice.
Rank selection: Sort by fitness and assign selection probability based on rank rather than raw fitness. This avoids scaling problems and reduces selective pressure compared to roulette wheel.
Crossover
Crossover recombines two parent solutions to produce offspring.
Single-point crossover. Choose a random cut point $c \in {1, \ldots, n-1}$. Offspring 1 takes genes $1, \ldots, c$ from parent 1 and $c+1, \ldots, n$ from parent 2; offspring 2 takes the complement.
Uniform crossover. Each gene is independently taken from parent 1 with probability $0.5$ and from parent 2 otherwise. Allows mixing from across the entire chromosome simultaneously.
Order crossover (OX) for permutations. Choose a segment from parent 1 and copy it to the offspring. Fill remaining positions with genes from parent 2 in the order they appear, skipping genes already placed. This preserves the relative order of elements not in the copied segment.
The crossover probability $p_c$ (fraction of offspring produced by crossover vs. direct copy) is typically $0.6$–$0.9$.
Mutation
Mutation introduces random perturbations to maintain diversity.
- Bit flip (binary): flip each bit independently with probability $p_m \approx 1/n$.
- Gaussian perturbation (real-valued): add $\mathcal{N}(0, \sigma^2)$ noise to each dimension; $\sigma$ is the mutation step size.
- Swap mutation (permutation): choose two positions at random and swap the genes.
- Inversion mutation (permutation): choose a segment and reverse it.
Mutation rates must be balanced: too low and the search relies entirely on crossover (poor diversity); too high and the algorithm becomes random search.
Elitism
Elitism copies the best $e$ solutions from the current generation directly to the next, without modification. This guarantees that the best solution found is never lost. Even with $e = 1$, elitism markedly stabilises convergence.
The Schema Theorem
Holland’s Schema Theorem provides a theoretical account of why GAs work. A schema $H$ is a template over ${0,1,\ast}^n$ where $\ast$ matches either bit. The order $o(H)$ is the number of fixed bits; the defining length $\delta(H)$ is the distance between the first and last fixed positions.
Let $m(H, t)$ be the expected number of individuals matching $H$ at generation $t$. With single-point crossover and fitness-proportionate selection:
$$m(H, t+1) \geq m(H, t) \cdot \frac{f(H)}{\bar{f}} \cdot \left(1 - p_c \frac{\delta(H)}{n-1} - o(H) p_m\right)$$
where $f(H)$ is the average fitness of schemata matching $H$ and $\bar{f}$ is the population average. Schemata that are short ($\delta(H)$ small), low-order ($o(H)$ small), and above-average fitness grow exponentially in the population. The Building Block Hypothesis proposes that GAs work by combining such short, fit schemata into increasingly good solutions.
No Free Lunch
The No Free Lunch (NFL) theorem (Wolpert and Macready, 1997) states that averaged over all possible optimisation problems (under a uniform prior), every optimisation algorithm performs equally well. No algorithm beats any other on average.
NFL does not undermine GAs in practice: real problems have structure, and GAs exploit that structure. The theorem is a reminder that all performance claims are relative to a problem class, not universal.
Genetic Programming
Genetic programming (Koza, 1992) evolves programs rather than fixed-length strings. Solutions are represented as trees: internal nodes are functions (arithmetic operations, conditionals, loops); leaves are terminals (constants, input variables). Crossover swaps subtrees; mutation replaces a subtree with a randomly generated one.
GP has been used to automatically discover physical laws from data, design electronic circuits, and evolve game-playing strategies.
Examples
NEAT (NeuroEvolution of Augmenting Topologies). Evolves both the weights and the topology of neural networks. Historical markings track gene origins, enabling meaningful crossover between networks of different structures. NEAT evolved controllers for pole balancing and game playing that matched or exceeded hand-designed networks.
Antenna Design. NASA used a GA to design antennas for the ST5 spacecraft. The evolved antenna had an irregular, counter-intuitive shape that outperformed human-designed alternatives. This was one of the first AI-designed hardware systems launched into space.
Parameter Optimisation. GAs are used in engineering design (turbine blade shapes, car crash safety), scheduling (job shops, shift planning), and machine learning hyperparameter search, where the fitness function is validation performance and the search space is combinatorial or mixed-integer.
Read Next: Beam Search