Error-Correcting Codes - Redundancy That Corrects Its Own Mistakes // Megha Bose

Helpful context:

Abstract Algebra: Rings & Fields - When Numbers Break Their Own Rules

Voyager 1 is roughly 24 billion kilometers from Earth. It transmits data back at 160 bits per second through the electromagnetic void, and by the time those signals arrive, cosmic noise has corrupted around 1% of the bits. Yet engineers at the Jet Propulsion Laboratory receive crisp images of interstellar space, not garbage. They achieve this not by wishing for a cleaner signal - the physics of radio propagation at that distance makes a clean signal impossible - but by designing the error correction into the transmission before it leaves the spacecraft. Claude Shannon proved in 1948 that you can transmit information over any noisy channel at any rate below its capacity with arbitrarily small error probability. His proof was nonconstructive: it showed that good codes exist without building them. The next seventy years of coding theory were spent actually constructing codes that realize Shannon’s promise.

The central tension in every error-correcting code is between protection and efficiency. To correct errors you must add redundancy: every $k$ bits of data becomes a codeword of $n > k$ bits. The rate $R = k/n$ measures how much of the transmitted signal carries actual information. The minimum distance $d$ of the code - the minimum number of positions where any two codewords differ - measures how much protection the code provides: a code with minimum distance $d$ can correct $\lfloor (d-1)/2 \rfloor$ errors and detect $d-1$ errors. The fundamental question is: for a given rate $R$, what is the maximum achievable minimum distance? Can you correct many errors while still transmitting mostly useful data? The answer depends on upper bounds (Hamming, Singleton, Plotkin) that limit how good any code can be, and lower bounds (Gilbert-Varshamov) that guarantee good codes exist. Closing the gap between these bounds remains one of the central open problems in coding theory.

These abstractions are not academic. Reed-Solomon codes protect every CD, DVD, and QR code ever made. LDPC codes are built into every 5G handset and Wi-Fi router manufactured in the last decade. BCH codes guard the data in NAND flash chips inside your phone. Convolutional codes carried images back from Voyager, the Mars rovers, and the Hubble Space Telescope. MDS erasure codes allow Amazon, Google, and Facebook to store petabytes of data with four-fault tolerance using only 40% storage overhead, compared to 200% for three-way replication. Error-correcting codes are the silent infrastructure on which reliable digital communication and storage depend.

Block Code Basics

A binary block code $C$ is a subset of $\{0,1\}^n$. The elements of $C$ are called codewords. If $|C| = 2^k$ then $C$ encodes $k$ bits of information using codewords of length $n$. Such a code is written $[n, k]$ or $[n, k, d]$ when we also specify the minimum distance $d$.

Throughout this post, $GF(q)$ denotes the Galois field (or finite field) with $q$ elements, where $q$ must be a prime power. The simplest case is $GF(2) = \{0, 1\}$ with XOR as addition and AND as multiplication - this is the binary field underlying all binary codes discussed below. Larger fields like $GF(2^8)$ treat each byte as a single field element; arithmetic is still closed (outputs stay within a byte) but uses polynomial arithmetic modulo an irreducible degree-8 polynomial over $GF(2)$. Reed-Solomon codes and most storage-oriented codes work over $GF(2^8)$ because it aligns naturally with byte-addressed hardware.

The parameters of a code are:

Length $n$: bits in each codeword
Dimension $k$: bits of information encoded (so $|C| = q^k$ over alphabet $GF(q)$)
Rate $R = k/n$: fraction of bits carrying information
Hamming distance $d_H(u, v)$: the number of bit positions where $u$ and $v$ differ. For example, $d_H(01101,\ 01001) = 1$ (they differ only in position 3). It satisfies the triangle inequality, making it a genuine metric on $\{0,1\}^n$.
Minimum distance $d = \min_{x \neq y \in C} d_H(x, y)$: the smallest Hamming distance between any two distinct codewords in $C$. It captures how “spread apart” the codewords are. If two codewords differ in $d$ positions, a channel would have to corrupt at least $d$ bits to turn one into the other - so the larger $d$ is, the harder it is to confuse codewords, and the more errors the code can absorb.

Error detection. A code with minimum distance $d$ detects any pattern of up to $t$ errors if and only if $d \geq t + 1$. If a codeword $c$ is corrupted in $t < d$ positions, the result cannot be another codeword, so any deviation from $C$ is detectable.

Error correction. A code with minimum distance $d$ corrects any pattern of up to $t$ errors if and only if $d \geq 2t + 1$. The Hamming balls of radius $t$ centered at distinct codewords are disjoint; a received word within distance $t$ of a codeword has a unique nearest neighbor.

Erasure correction. An erasure is a bit that was lost (the receiver knows which positions were erased, but not their values). A code with minimum distance $d$ corrects any $e$ erasures if and only if $d \geq e + 1$.

Discomfort check. Error correction and error detection are not symmetric. A [7,4,3] Hamming code corrects 1 error or detects 2 errors - but not both simultaneously. If you use it in correction mode, a double error will be “corrected” to the wrong codeword without any alert. In practice, systems either use separate mechanisms or deliberately choose $d$ large enough that both guarantees hold simultaneously.

Linear Codes

A linear code over $GF(q)$ is a $k$-dimensional subspace of $GF(q)^n$. This is the most important special case because the algebraic structure enables efficient encoding, decoding, and analysis. Every linear code has a generator matrix $G$ of size $k \times n$ whose rows form a basis for the code. Encoding a message $m \in GF(q)^k$ produces codeword $c = mG$.

The parity-check matrix $H$ of size $(n-k) \times n$ satisfies $Hx = 0$ if and only if $x \in C$. The rows of $H$ span the dual code $C^\perp$. Given $G$ in systematic form $[I_k \mid P]$, we have $H = [-P^T \mid I_{n-k}]$ (over $GF(2)$, negation is identity, so $H = [P^T \mid I_{n-k}]$).

A critical property: the minimum distance of a linear code equals the minimum Hamming weight of any nonzero codeword. This is because for any $x, y \in C$, we have $x - y \in C$ (since $C$ is a subspace), so $d_H(x,y) = wt(x - y)$.

Syndrome decoding. When a codeword $c$ is transmitted and received as $r = c + e$ (where $e$ is the error vector), the syndrome is: $$s = Hr = H(c + e) = Hc + He = He$$ since $Hc = 0$. The syndrome depends only on the error pattern, not on the codeword. If $s = 0$, no detectable error occurred. Syndrome decoding finds the minimum-weight $e$ consistent with the syndrome $s$.

Systematic form. In systematic form, $G = [I_k \mid P]$ and the first $k$ bits of any codeword are exactly the original message. This makes encoding and information extraction straightforward.

Wider view. The same mathematics surfaces in hash functions. The syndrome $s = Hx$ is literally a hash function: it maps every received word to a short fingerprint of its error class, with all codewords colliding to zero - syndrome decoding is then nearest-neighbour lookup in a table keyed by that hash. CRC checksums are cyclic codes used as integrity hashes: the checksum is the remainder of polynomial division over $GF(2)$, and the code’s minimum distance determines exactly which error patterns are detectable. Carter-Wegman universal hashing evaluates a random polynomial over a finite field at a fixed point, the identical construction to Reed-Solomon encoding (coming up next); the collision probability bound $\leq 1/q$ falls out of the same fact that a degree-$(k-1)$ polynomial has at most $k-1$ roots. Locality-Sensitive Hashing inverts the geometry entirely: where coding theory pushes codewords far apart so noise cannot confuse them, LSH deliberately maps nearby inputs to the same bucket - the same Hamming distance metric, the opposite design goal.

Hamming codes. The binary Hamming code $[2^r - 1, 2^r - 1 - r, 3]$ is constructed by taking a parity-check matrix $H$ whose columns are all $2^r - 1$ nonzero binary strings of length $r$. For example, the $[7, 4, 3]$ Hamming code has $H$ as a $3 \times 7$ matrix whose columns are $001, 010, 011, 100, 101, 110, 111$ in some order. The minimum distance is 3, so it corrects 1 error. When a single-bit error occurs in position $i$, the syndrome $s = He$ equals the $i$-th column of $H$, which is the binary representation of $i$. The syndrome directly identifies the error position.

Discomfort check. The columns of $H$ can be arranged in any order, which changes which specific positions the syndrome points to. The Hamming code is unique up to column permutation of $H$. When you see “[7,4,3] Hamming code” in the wild, the columns may be arranged differently from a textbook - the decoding table needs to match the $H$ used.

Bounds on Codes

Four classical bounds constrain the achievable $[n, k, d]$ triples. They divide into upper bounds (limiting how good a code can be) and lower bounds (guaranteeing good codes exist).

Hamming (sphere-packing) bound. A code correcting $t$ errors requires disjoint Hamming balls of radius $t$ around each codeword. The volume of a Hamming ball of radius $t$ in $\{0,1\}^n$ is: $$V(n, t) = \sum_{i=0}^{t} \binom{n}{i}$$ Since all these balls must fit inside $\{0,1\}^n$: $$|C| \cdot V(n, t) \leq 2^n \implies |C| \leq \frac{2^n}{V(n, t)}$$ In terms of dimension: $k \leq n - \log_2 V(n,t)$. A code achieving equality is called a perfect code: the balls exactly tile the space with no waste.

The Hamming codes $[2^r - 1, 2^r - 1 - r, 3]$ are perfect for $t = 1$. The binary Golay code $[23, 12, 7]$ is perfect for $t = 3$. These are, in a precise sense, the only nontrivial perfect codes over $GF(2)$ (a theorem due to van Lint and Tietavainen).

Singleton bound. For any $[n, k, d]$ code over $GF(q)$: $$k \leq n - d + 1$$

Proof. Puncture the code by deleting $d-1$ coordinates from every codeword. The resulting code lives in $GF(q)^{n-d+1}$ and still has distinct codewords (since any two codewords differed in at least $d$ positions, removing $d-1$ cannot make them identical). So $q^k \leq q^{n-d+1}$, giving $k \leq n - d + 1$.

A code achieving $k = n - d + 1$ is called Maximum Distance Separable (MDS). MDS codes achieve the best possible distance for their rate. Reed-Solomon codes are MDS.

Gilbert-Varshamov bound. There exist binary $[n, k, d]$ codes with: $$k \geq n - \log_2 V(n, d-1)$$

Proof sketch. Build the code greedily: start empty, keep adding vectors from $\{0,1\}^n$ that differ from all current codewords in at least $d$ positions. Each rejected candidate falls within Hamming distance $d-1$ of some existing codeword, contributing at most $V(n, d-1)$ exclusions. We can add codewords until $|C| \cdot V(n, d-1) \geq 2^n$, so we can achieve $|C| \geq 2^n / V(n, d-1)$, giving $k \geq n - \log_2 V(n, d-1)$.

The GV bound guarantees good codes exist but is nonconstructive. Whether explicit codes matching the GV bound exist for all parameters is open (the GV conjecture).

Plotkin bound. Over $GF(2)$, if $d > n/2$: $$|C| \leq \frac{2d}{2d - n}$$

The Plotkin bound is useful in the high-distance regime and shows that codes with $d > n/2$ cannot have many codewords.

Discomfort check. The Hamming bound and GV bound do not generally match. The GV bound guarantees codes of rate roughly $1 - H_2(d/n)$ (where $H_2$ is the binary entropy function), while the Hamming bound requires rate at most $1 - H_2(d/2n)$. There is a gap between what is provably achievable and what is provably impossible, and whether that gap can be closed is a major open problem.

Cyclic Codes

A linear code $C$ is cyclic if for every codeword $(c_0, c_1, \ldots, c_{n-1}) \in C$, the cyclic shift $(c_{n-1}, c_0, \ldots, c_{n-2})$ is also in $C$. Cyclic codes are natural to implement in hardware: a shift register performs the encoding operation directly.

The algebraic structure of cyclic codes becomes clear by representing codewords as polynomials. Identify a vector $(c_0, \ldots, c_{n-1})$ with the polynomial $c(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1}$ in the quotient ring $R_n = GF(q)[x]/(x^n - 1)$. A cyclic shift of $c(x)$ corresponds to multiplication by $x$ in $R_n$ (with the term $c_{n-1} x^n$ wrapping around to $c_{n-1}$ since $x^n = 1$).

A cyclic code is an ideal in $R_n$. Every ideal in $R_n$ is principal - generated by a single polynomial $g(x)$ dividing $x^n - 1$. The polynomial $g(x)$ is the generator polynomial. Codewords are exactly the multiples $m(x) g(x)$ for messages $m(x)$ of degree less than $k = n - \deg g$.

The check polynomial is $h(x) = (x^n - 1)/g(x)$. It satisfies $g(x) h(x) = 0$ in $R_n$, and $c(x)$ is a codeword if and only if $c(x) h(x) \equiv 0 \pmod{x^n - 1}$.

Encoding in systematic form: to encode message $m(x)$, compute $m(x) x^{n-k} = q(x) g(x) + r(x)$, then send $m(x) x^{n-k} - r(x)$. This places the message in the high-degree coefficients and the redundancy in the low-degree ones.

The roots of $g(x)$ in the splitting field of $x^n - 1$ determine the minimum distance: if $g(\alpha^b) = g(\alpha^{b+1}) = \cdots = g(\alpha^{b+d-2}) = 0$ for a primitive $n$-th root of unity $\alpha$, then the minimum distance is at least $d$ (BCH bound).

Reed-Solomon Codes

Reed-Solomon codes are the most practically important family of algebraic codes. Fix a finite field $GF(q)$ and choose $n$ distinct nonzero elements $\alpha_1, \ldots, \alpha_n \in GF(q)^*$ (with $n \leq q - 1$). An $[n, k]$ Reed-Solomon code over $GF(q)$ is defined as:

$$C = \{(f(\alpha_1), f(\alpha_2), \ldots, f(\alpha_n)) : f \in GF(q)[x],\ \deg f < k\}$$

This code has parameters $[n, k, n - k + 1]$ and is MDS. The proof that $d = n - k + 1$ follows from a basic fact of algebra: two distinct polynomials of degree less than $k$ agree on at most $k - 1$ points. So two distinct codewords $f$ and $g$ satisfy $f(\alpha_i) = g(\alpha_i)$ for at most $k-1$ values of $i$, meaning they differ in at least $n - (k-1) = n - k + 1$ positions.

Encoding is polynomial evaluation: compute $f(\alpha_1), \ldots, f(\alpha_n)$ for the degree-$(k-1)$ polynomial $f$ determined by the $k$ message symbols. This takes $O(n^2)$ operations naively, or $O(n \log n)$ using the fast Fourier transform over finite fields.

Decoding is polynomial interpolation from noisy evaluations. If at most $t = \lfloor (n-k)/2 \rfloor$ of the $n$ received symbols are corrupted, we need to recover $f$ from $n$ evaluations where $t$ are wrong. The Berlekamp-Massey algorithm solves this in $O(n^2)$ time by finding the error locator polynomial $\Lambda(x) = \prod_{i \in E}(1 - \alpha_i x)$ whose roots identify the error positions $E$. Once the error positions are known, the error values are determined by solving a linear system. For soft-decision or list decoding, the Sudan-Guruswami algorithm decodes up to $n - \sqrt{nk}$ errors (beyond the unique decoding radius).

When $n = q - 1$ (the full-length case), the evaluation points are all nonzero elements of $GF(q)$ and the RS code is cyclic. In general RS codes are not cyclic, but they are MDS.

Applications. CDs use an interleaved RS[28, 24, 5] code that can correct bursts of up to 4000 consecutive bit errors (which occur when the disc surface is scratched). DVDs use a similar scheme. QR codes use RS over $GF(256)$ at four error correction levels (L/M/Q/H) correcting 7%, 15%, 25%, or 30% of codeword bytes. Voyager used a concatenated code with RS as the outer code.

Discomfort check. The MDS property - meeting the Singleton bound - sounds like a free lunch. It is not. RS codes require large alphabets: an $[n, k, d]$ RS code requires $q \geq n$, so long codes need large fields. Over $GF(2)$, no nontrivial MDS code longer than a few bits exists. The MDS conjecture asserts that over $GF(q)$ with $q$ prime, any MDS code has $n \leq q + 1$ (with a few exceptions). This remains open in general.

BCH Codes

BCH codes (Bose-Chaudhuri-Hocquenghem, independently 1959-1960) are a family of cyclic codes with a specified designed minimum distance, constructed by algebraic control of the roots of the generator polynomial.

Let $n = q^m - 1$ and let $\alpha$ be a primitive $n$-th root of unity in $GF(q^m)$. For a designed distance $d$ and starting exponent $b$, define the generator polynomial:

$$g(x) = \text{lcm}(m_b(x), m_{b+1}(x), \ldots, m_{b+d-2}(x))$$

where $m_i(x)$ is the minimal polynomial of $\alpha^i$ over $GF(q)$. The resulting cyclic code is a BCH code with designed distance $d$.

The BCH bound guarantees that the actual minimum distance satisfies $d_{min} \geq d$: if $g(\alpha^b) = \cdots = g(\alpha^{b+d-2}) = 0$, then by the BCH bound any codeword polynomial with $d-1$ consecutive roots of this form has weight at least $d$.

The dimension satisfies $k \geq n - m(d-1)$, since $g(x)$ has at most $m(d-1)$ distinct roots (each minimal polynomial has degree at most $m$).

BCH codes are related to Reed-Solomon codes: a primitive BCH code over $GF(q)$ of length $n = q^m - 1$ is a subfield subcode of an RS code over $GF(q^m)$ of the same length. RS codes arise as the special case $m = 1$.

Decoding uses the Berlekamp-Massey algorithm (or Euclidean algorithm for polynomials), identical in structure to RS decoding: find the error locator polynomial, then find its roots via Chien search, then determine error magnitudes.

Application: NAND flash ECC. Flash memory cells degrade with write cycles. A fresh flash block might need BCH with $t = 4$ error correction; an aged block near end-of-life might need $t = 24$. BCH codes are ideal because the designed distance parameter $d$ can be adjusted to match the error rate of the specific hardware generation, while the cyclic structure enables efficient hardware implementation.

Convolutional Codes

Block codes treat each $k$-bit message independently. Convolutional codes have memory: the $n$ output bits at each time step depend on the current $k$ input bits and the previous $K-1$ input blocks. The integer $K$ is the constraint length.

A rate-$1/n$ binary convolutional encoder has one shift register of length $K$. At each step, one input bit enters the register and $n$ output bits are computed as linear combinations (XOR sums) of register contents, specified by $n$ generator polynomials $g^{(1)}(x), \ldots, g^{(n)}(x)$ in $GF(2)[x]$ of degree at most $K-1$.

The encoder is a finite-state machine with $2^{K-1}$ states. Its operation is visualized by the trellis diagram: a layered graph where each layer has $2^{K-1}$ nodes (states), and each node has two outgoing edges (for input 0 or 1) producing the two possible output sequences.

Viterbi decoding finds the maximum-likelihood transmitted sequence by dynamic programming on the trellis. It maintains for each state at each time step the best (minimum distance) path to that state. The complexity is $O(2^{K-1} \cdot L)$ for a sequence of length $L$, linear in $L$ but exponential in $K$. In practice, $K$ is limited to 7 to 9 to keep decoding feasible.

The free distance $d_{free}$ is the minimum Hamming weight of any nonzero output sequence. It determines the error-correcting capability: $t = \lfloor (d_{free} - 1)/2 \rfloor$ errors can be corrected.

Puncturing removes some output bits to increase the rate. A rate-1/2 code can be punctured to rate-2/3 or rate-3/4 by deleting a fraction of outputs according to a puncture pattern, with a moderate loss in distance.

Applications. The $[2, 1, 7]$ convolutional code (rate 1/2, constraint length 7) was used in GSM (2G) and IS-95 CDMA. Voyager used a $[2, 1, 7]$ code. NASA deep-space missions used concatenated codes: a convolutional inner code with Viterbi decoding, concatenated with a Reed-Solomon outer code. This concatenated scheme dominated deep-space communications from the 1970s until LDPC codes superseded it.

Discomfort check. The Viterbi algorithm gives exact maximum-likelihood decoding for convolutional codes. It does not give approximate or heuristic results - it is genuinely optimal. The catch is that “maximum likelihood” means most likely single sequence, not most likely individual bits. For bit-by-bit decisions, BCJR (Bahl-Cocke-Jelinek-Raviv) computes exact bit-wise posterior probabilities and is used in turbo decoding.

LDPC Codes

Low-Density Parity-Check (LDPC) codes were invented by Gallager in 1962 and forgotten for three decades before being rediscovered in the late 1990s when they were found to approach Shannon capacity.

An LDPC code is a linear code defined by a sparse parity-check matrix $H$. “Sparse” means the number of 1s in $H$ is $O(n)$ rather than $O(n^2)$. The sparsity enables efficient iterative decoding.

The structure of $H$ is represented by a Tanner graph: a bipartite graph with $n$ variable nodes (columns of $H$) and $n-k$ check nodes (rows of $H$). An edge connects check node $c$ to variable node $v$ if $H_{cv} = 1$. A codeword assigns a bit to each variable node such that every check node’s neighborhood XORs to 0.

Regular LDPC codes have all variable nodes of degree $d_v$ and all check nodes of degree $d_c$, with $d_c (n-k) = d_v n$ (edge count). Irregular LDPC codes allow varying degrees. Let $\lambda_i$ (respectively $\rho_i$) denote the fraction of edges adjacent to a variable node (respectively check node) of degree $i$. The degree distributions $\lambda(x) = \sum_i \lambda_i x^{i-1}$ and $\rho(x) = \sum_i \rho_i x^{i-1}$ control the code’s performance.

Belief propagation (sum-product) decoding operates on the Tanner graph. Each variable node and check node exchange real-valued messages representing log-likelihood ratios. At each iteration:

Variable-to-check: each variable node combines its channel LLR with incoming messages from all other check nodes
Check-to-update: each check node computes an outgoing message based on all incoming variable messages using the check equation

After $O(\log n)$ to $O(n)$ iterations, variable nodes make hard decisions. Each iteration takes $O(n)$ time (proportional to the number of edges, which is $O(n)$ by sparsity).

Density evolution analyzes the asymptotic performance as $n \to \infty$. It tracks the distribution of messages across iterations, assuming the Tanner graph is cycle-free (which it approximately is for sparse random graphs). This gives the threshold $\sigma^*$ (noise level above which BP fails) as a function of the degree distribution $(\lambda, \rho)$. Optimizing $(\lambda, \rho)$ to maximize the threshold is a convex optimization problem, and the resulting capacity-approaching codes operate within 0.0045 dB of the Shannon limit for the AWGN channel.

Applications. LDPC codes replaced convolutional codes in most modern wireless standards: IEEE 802.11n/ac/ax (Wi-Fi), DVB-S2 (satellite television), and 5G NR (both data and control channels at high code rates). They are used in 10GBASE-T Ethernet and in hard disk drive controllers. The reason for displacement of convolutional codes is simple: LDPC codes at practical block lengths ($n \sim 10^3$ to $10^5$) perform 1-3 dB better than convolutional codes, directly translating to either lower power consumption or higher data rates.

Discomfort check. Belief propagation is exact only on trees. On a Tanner graph with cycles, it is an approximation whose accuracy degrades with short cycle length. The girth of the Tanner graph (length of the shortest cycle) must be large for BP to work well. Regular LDPC codes with large girth can be constructed by algebraic or combinatorial methods. Random sparse graphs have girth $O(\log n)$ with high probability, which is typically sufficient at practical block lengths.

Erasure Codes and Distributed Storage

The erasure channel is a channel where bits are not corrupted but lost: the receiver knows which positions were erased (value unknown) but not what the original values were. This models packet loss in networks and disk failures in storage systems.

For erasure correction, the relevant parameter is the number of erasures recovered, not errors. A code with minimum distance $d$ corrects any $e$ erasures if and only if $d \geq e + 1$. Equivalently, any $k$ of the $n$ codeword symbols must suffice to recover the full codeword.

MDS codes are optimal for storage. An $[n, k, n-k+1]$ MDS code stores data in $n$ chunks such that any $k$ of the $n$ chunks suffice to recover all data. The storage overhead is $n/k$. Compare:

Three-way replication: $n/k = 3$, tolerates $n - k = 2$ failures (any 1 of 3 copies suffices)
RS[14, 10]: $n/k = 1.4$, tolerates $n - k = 4$ failures (any 10 of 14 chunks suffice)

At petabyte scale, the difference between 1.4x and 3x storage overhead is enormous. Facebook reported saving hundreds of petabytes by switching from replication to erasure coding in their HDFS storage.

Reed-Solomon for distributed storage. Over $GF(2^8)$, an RS[14, 10] code operates on byte-aligned chunks. Each data block is divided into 10 data chunks; 4 parity chunks are computed by polynomial evaluation over $GF(2^8)$. Any 10 of the 14 chunks can be used to reconstruct the original 10 data chunks by polynomial interpolation.

Regenerating codes address a practical limitation of MDS erasure codes: when a failed node is repaired, the naive approach downloads all $k$ surviving chunks and re-encodes. If each chunk has $B$ bits, this means $kB$ bits are read for a repair. Regenerating codes allow a failed node to download a small amount $\beta \ll B$ from each of $d$ surviving nodes, for a total repair bandwidth of $d\beta \ll kB$. The minimum-storage regenerating (MSR) and minimum-bandwidth regenerating (MBR) points on the storage-bandwidth tradeoff are achievable by explicit constructions.

Local Reconstruction Codes (LRC) trade some MDS distance for fast local repair. In an LRC code, each symbol participates in a small local parity group of size $r + 1$. A single failure can be repaired by reading only $r$ other symbols (from the local group), rather than $k$ symbols. The Singleton-like bound for LRC codes is $d \leq n - k - \lceil k/r \rceil + 2$. Microsoft Azure’s storage uses LRC codes with $r = 6$, $k = 18$, $n = 22$, and Windows Azure Storage uses similar parameters. Facebook’s f4 and similar systems use LRC variants that balance repair locality against fault tolerance.

Discomfort check. MDS codes over large alphabets (like RS over $GF(2^8)$) require arithmetic in the finite field, which is more expensive than XOR. RAID-5 uses a $[n, n-1, 2]$ parity code (just XOR of all blocks), and RAID-6 uses a $[n, n-2, 3]$ code - both can be computed with XOR alone (using Galois field arithmetic optimized for $GF(2^8)$). For larger numbers of parity chunks, the field operations become more complex and Intel provides hardware acceleration (CLMUL instruction) for $GF(2^8)$ arithmetic.

Summary

Code Family	Parameters	Correction Capability	Key Property	Application
Hamming	$[2^r - 1, 2^r - 1 - r, 3]$	1 error	Perfect code	Memory ECC
Reed-Solomon	$[n, k, n-k+1]$ over $GF(q)$	$\lfloor (n-k)/2 \rfloor$ errors	MDS, meets Singleton	CDs, DVDs, QR codes, storage
BCH	$[n, \geq n - m(d-1), \geq d]$	$t$ errors (designed)	Cyclic, algebraic structure	Flash memory ECC
Convolutional	rate $k/n$, constraint length $K$	by Viterbi on trellis	Encoder memory, trellis decoding	Cellular, deep-space
LDPC	sparse $H$, rate $k/n$	near-capacity	Belief propagation	5G, Wi-Fi, satellite
MDS erasure	$[n, k, n-k+1]$	$n - k$ erasures	Optimal storage efficiency	Cloud storage, distributed RAID