Gaussian Elimination - Solving Linear Systems, Row by Row
Helpful context:
The previous two posts built up to a clean formulation of the central problem of linear algebra: solve $A\mathbf{x} = \mathbf{b}$. We know what the equation means - find the combination of columns of $A$ that produces $\mathbf{b}$. We know what the solution set looks like - a particular solution plus the null space. We even know, from the vector spaces post, what “column space” and “null space” mean structurally.
What we have not answered is the practical question: given a concrete $A$ and $\mathbf{b}$, how do you actually find $\mathbf{x}$?
For a $2 \times 2$ system with small numbers, you can subtract one equation from another and get the answer in two steps. But what about a $100 \times 100$ system? What about $1000 \times 1000$? The method needs to be systematic - a procedure that works every time, never gets confused, and along the way tells you whether there is one solution, no solution, or infinitely many.
That procedure is Gaussian elimination. It is the foundation of computational linear algebra. Every serious linear system solver - whether in Python’s NumPy, MATLAB, or the software guiding a spacecraft - is, at its core, a carefully engineered version of the same idea you are about to see.
The Intuition: Simplify Until Obvious
Before any formalism, here is the core idea in one sentence: repeatedly use one equation to kill a variable in all the others, until what remains is trivially solvable.
Say you have three equations in three unknowns and the first equation involves $x_1$. Use it to eliminate $x_1$ from equations 2 and 3 - subtract appropriate multiples. Now equations 2 and 3 involve only $x_2$ and $x_3$. Take equation 2 and use it to eliminate $x_2$ from equation 3. Now equation 3 involves only $x_3$: solve it. Plug $x_3$ into equation 2: solve for $x_2$. Plug both into equation 1: solve for $x_1$.
That is it. The rest of this post is just making that idea precise, tracking what can go wrong, and understanding what it reveals about the structure of the system.
A Brief History
The method is named after Carl Friedrich Gauss, but the idea is considerably older. Chinese mathematicians used essentially the same procedure in The Nine Chapters on the Mathematical Art (九章算術), compiled around 200 BCE. The text describes arranging counting rods in a table and performing column operations to solve systems - what we would recognize today as row reduction on an augmented matrix. The method was rediscovered in Europe over the following millennia, and Gauss himself described a refined version in his 1809 work on least squares. The name “Gaussian elimination” became standard only in the 20th century, after numerical analysts recognized it as the right foundation for computational linear algebra.
The word echelon comes from the French échelon, meaning a rung of a ladder. A row echelon form looks like a staircase: each row’s leading nonzero entry (the pivot) sits strictly to the right of the pivot in the row above, creating a stepped pattern down the matrix. The military used “echelon formation” to describe troops arranged in a diagonal staggered pattern - the same staircase shape. The pivot itself takes its name from French as well: pivot meaning the pin on which a lever turns, the fixed point everything else rotates around. In a row operation, the pivot is exactly that: the fixed entry you use to rotate away (zero out) all the entries below it.
Where did pivots come from conceptually? The key insight, present in the Chinese rod-arithmetic method and made explicit by Gauss, is that you only need one nonzero entry per column to drive the elimination of that variable from all other rows. That entry - whatever it happens to be - does all the work. Calling it the pivot captures this: everything in its column below it gets zeroed by subtracting the right multiple of its row.
What Operations Are Allowed?
The question is: what can you do to a system of equations without changing its solutions?
Write the system explicitly. Say we have three equations in three unknowns:
$$\begin{aligned} a_{11} x_1 + a_{12} x_2 + a_{13} x_3 &= b_1 \\ a_{21} x_1 + a_{22} x_2 + a_{23} x_3 &= b_2 \\ a_{31} x_1 + a_{32} x_2 + a_{33} x_3 &= b_3 \end{aligned}$$
Any $\mathbf{x}$ that satisfies all three equations simultaneously is a solution. The question is: if you modify the system, which modifications preserve this solution set exactly?
Three operations do:
Operation 1: Swap two equations. If $\mathbf{x}$ satisfies equation 1 and equation 2, it satisfies them in either order. The solution set does not change.
Operation 2: Multiply an equation by a nonzero constant. If $\mathbf{x}$ satisfies $2x_1 + x_2 = 5$, it also satisfies $4x_1 + 2x_2 = 10$, and vice versa (divide both sides by 2). The nonzero requirement is essential: multiplying by zero destroys information.
Operation 3: Add a multiple of one equation to another. This is the key operation. Suppose $\mathbf{x}$ satisfies both equation 1 (call it $E_1$) and equation 2 (call it $E_2$). Form a new equation: $E_1 + c \cdot E_2$. Any $\mathbf{x}$ satisfying $E_1$ and $E_2$ also satisfies $E_1 + c \cdot E_2$ (just add $c$ times the second equality to the first). And conversely, if $\mathbf{x}$ satisfies $E_1 + c \cdot E_2$ and $E_2$, then it satisfies $E_1 = (E_1 + c \cdot E_2) - c \cdot E_2$. So replacing $E_1$ with $E_1 + c \cdot E_2$ gives an equivalent system.
These three operations are called elementary row operations. Applying any sequence of them transforms the system into an equivalent one - same solution set, different appearance.
The strategy of Gaussian elimination: use operation 3 relentlessly to create zeros below the diagonal, transforming the system into a triangular shape that is trivially solvable.
Why Triangular Systems Are Easy
Before eliminating, see why triangular is the target. A $3 \times 3$ upper triangular system looks like:
$$\begin{pmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} c_1 \\ c_2 \\ c_3 \end{pmatrix}$$
The bottom equation involves only $x_3$: solve it directly. The middle equation involves $x_2$ and $x_3$: substitute $x_3$, solve for $x_2$. The top equation involves all three: substitute the known $x_2$ and $x_3$, solve for $x_1$.
Each equation has exactly one more unknown than the one below it, but by the time you reach it, all later unknowns are already known. This bottom-up solving process is called back substitution, and it takes $O(n^2)$ operations for an $n \times n$ triangular system.
The entire effort of Gaussian elimination is to reach this triangular form.
The Augmented Matrix
Instead of writing out full equations at each step, encode everything in a matrix. The augmented matrix $[A \mid \mathbf{b}]$ places $\mathbf{b}$ as a final column:
$$\left(\begin{array}{ccc|c} a_{11} & a_{12} & a_{13} & b_1 \\ a_{21} & a_{22} & a_{23} & b_2 \\ a_{31} & a_{32} & a_{33} & b_3 \end{array}\right)$$
Each row represents one equation. Elementary row operations on equations become row operations on this matrix. Every operation applied to the coefficient columns must also be applied to the right-hand side column - which is why $\mathbf{b}$ is included.
Forward Elimination: Building the Triangle
Here is a concrete $3 \times 3$ system:
$$A = \begin{pmatrix} 1 & 2 & 0 \\ 2 & 5 & 2 \\ 1 & 5 & 9 \end{pmatrix}, \qquad \mathbf{b} = \begin{pmatrix} 5 \\ 14 \\ 20 \end{pmatrix}$$
Write the augmented matrix:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 2 & 5 & 2 & 14 \\ 1 & 5 & 9 & 20 \end{array}\right)$$
Step 1: Eliminate $x_1$ from rows 2 and 3.
The entry in position $(1,1)$ is 1. This is the first pivot - the entry we use to create zeros below it in column 1.
The entry in position $(2,1)$ is 2. To zero it out, subtract $2 \times$ row 1 from row 2. The number 2 is the multiplier $m_{21} = 2$:
$$R_2 \leftarrow R_2 - 2 R_1: \quad (2-2, 5-4, 2-0, 14-10) = (0, 1, 2, 4)$$
The entry in position $(3,1)$ is 1. Subtract $1 \times$ row 1 from row 3. The multiplier is $m_{31} = 1$:
$$R_3 \leftarrow R_3 - 1 R_1: \quad (1-1, 5-2, 9-0, 20-5) = (0, 3, 9, 15)$$
After step 1:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & 2 & 4 \\ 0 & 3 & 9 & 15 \end{array}\right)$$
Column 1 below the pivot is now all zeros.
Step 2: Eliminate $x_2$ from row 3.
The new entry in position $(2,2)$ is 1. This is the second pivot.
The entry in position $(3,2)$ is 3. Subtract $3 \times$ row 2 from row 3. Multiplier $m_{32} = 3$:
$$R_3 \leftarrow R_3 - 3 R_2: \quad (0, 3-3, 9-6, 15-12) = (0, 0, 3, 3)$$
After step 2:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & 2 & 4 \\ 0 & 0 & 3 & 3 \end{array}\right)$$
Upper triangular. The three pivots are 1, 1, 3 (the diagonal entries of the resulting matrix $U$).
Back Substitution
Read off the solution from the triangular system, bottom to top.
Row 3: $3 x_3 = 3$, so $x_3 = 1$.
Row 2: $x_2 + 2 x_3 = 4$. Substitute $x_3 = 1$: $x_2 = 2$.
Row 1: $x_1 + 2 x_2 + 0 \cdot x_3 = 5$. Substitute $x_2 = 2$: $x_1 = 1$.
Solution: $(x_1, x_2, x_3) = (1, 2, 1)$.
Verify directly: $A\mathbf{x}$:
- Row 1: $1(1) + 2(2) + 0(1) = 5$ ✓
- Row 2: $2(1) + 5(2) + 2(1) = 14$ ✓
- Row 3: $1(1) + 5(2) + 9(1) = 20$ ✓
When the Pivot Is Zero
What if the pivot position is zero? You cannot divide by zero, so you cannot use it as a pivot.
The fix: row swap. Look below the zero for any row with a nonzero entry in the same column. Swap that row up. If a nonzero entry exists somewhere below, the swap gives you a valid pivot and elimination continues.
Example: suppose after some steps you have:
$$\left(\begin{array}{ccc|c} 2 & 4 & 1 & 7 \\ 0 & 0 & 3 & 6 \\ 0 & 2 & 5 & 8 \end{array}\right)$$
The pivot candidate for column 2 is the $(2,2)$ entry, which is 0. Row 3 has a nonzero entry in column 2. Swap rows 2 and 3:
$$\left(\begin{array}{ccc|c} 2 & 4 & 1 & 7 \\ 0 & 2 & 5 & 8 \\ 0 & 0 & 3 & 6 \end{array}\right)$$
Now column 2 has a valid pivot (the 2), and column 3 is already zero below its pivot (the 3). Triangular form achieved without further work.
In practice, solvers often swap to the row with the largest entry in the column, not just any nonzero one. This is called partial pivoting and it controls numerical errors when computing with floating-point numbers.
When No Pivot Exists: Two Outcomes
Suppose after eliminating column $k$, every entry below the pivot position is zero - including the pivot position itself. No swap can help: the entire column below row $k$ is zero. The variable $x_k$ has no pivot.
This situation signals either that the system has no solution or infinitely many solutions. The right-hand side entry in that row determines which.
Case 1: Inconsistent system (no solution).
The row reduces to $(0, 0, \ldots, 0 \mid c)$ with $c \neq 0$. The equation it represents is $0 = c$, which is false for any $\mathbf{x}$. No solution exists.
Example:
$$\left(\begin{array}{cc|c} 1 & 1 & 3 \\ 2 & 2 & 7 \end{array}\right)$$
$R_2 \leftarrow R_2 - 2R_1$:
$$\left(\begin{array}{cc|c} 1 & 1 & 3 \\ 0 & 0 & 1 \end{array}\right)$$
Row 2 says $0 = 1$. No solution. Geometrically: two parallel lines that never meet. The destination $\mathbf{b} = (3, 7)^T$ is not in the column space of $A$.
Case 2: Free variable (infinitely many solutions).
The row reduces to $(0, 0, \ldots, 0 \mid 0)$: the equation $0 = 0$, which is true for every $\mathbf{x}$. The variable in that column has no constraint from this row. It can take any value - it is a free variable. For each value of the free variable, you get a different solution.
Example:
$$A = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 0 & 1 & 2 \end{pmatrix}, \qquad \mathbf{b} = \begin{pmatrix} 1 \\ 2 \\ 1 \end{pmatrix}$$
Note: row 2 of $A$ is exactly 2 times row 1. The augmented matrix:
$$\left(\begin{array}{ccc|c} 1 & 2 & 3 & 1 \\ 2 & 4 & 6 & 2 \\ 0 & 1 & 2 & 1 \end{array}\right)$$
$R_2 \leftarrow R_2 - 2R_1$: row 2 becomes $(0, 0, 0, 0)$. Swap rows 2 and 3 to bring the pivot up:
$$\left(\begin{array}{ccc|c} 1 & 2 & 3 & 1 \\ 0 & 1 & 2 & 1 \\ 0 & 0 & 0 & 0 \end{array}\right)$$
Only two pivots (columns 1 and 2). Column 3 has no pivot: $x_3$ is free.
Set $x_3 = t$ for any $t \in \mathbb{R}$:
- Row 2: $x_2 + 2t = 1 \Rightarrow x_2 = 1 - 2t$
- Row 1: $x_1 + 2(1-2t) + 3t = 1 \Rightarrow x_1 = -1 + t$
The complete solution is:
$$\mathbf{x} = \begin{pmatrix} -1 \\ 1 \\ 0 \end{pmatrix} + t \begin{pmatrix} 1 \\ -2 \\ 1 \end{pmatrix}, \qquad t \in \mathbb{R}$$
The first vector is a particular solution (set $t = 0$). The second vector $(1, -2, 1)^T$ is a null space vector - check: $A(1, -2, 1)^T = (1-4+3, 2-8+6, 0-2+2)^T = (0,0,0)^T$ ✓.
This is the structure from the vector spaces post: the solution set is a particular solution plus the null space. Gaussian elimination finds both automatically.
What if the free variable is not the last one?
The example above had $x_3$ as the free variable because the last row vanished. But a free variable can appear in the middle of the variable list too, when an entire column has no nonzero entry at or below the current pivot position - not because a row vanished, but because the column simply has no candidates.
Example. Consider:
$$\left(\begin{array}{ccc|c} 1 & 1 & 1 & b_1 \\ 0 & 0 & 1 & b_2 \\ 0 & 0 & 2 & b_3 \end{array}\right)$$
Column 1 already has zeros below the pivot. Move to column 2. The pivot candidate is position $(2,2)$, which is 0. Look below: row 3 also has 0 in column 2. No swap can help - the entire column 2 below and at the pivot row is zero. $x_2$ is immediately a free variable. Do not advance the row counter. Move the pivot search to column 3, still on row 2.
Column 3, row 2: entry is 1. This is the second pivot. Eliminate below it: $R_3 \leftarrow R_3 - 2R_2$:
$$\left(\begin{array}{ccc|c} 1 & 1 & 1 & b_1 \\ 0 & 0 & 1 & b_2 \\ 0 & 0 & 0 & b_3 - 2b_2 \end{array}\right)$$
Now the last row gives a consistency check: we need $b_3 - 2b_2 = 0$, i.e. $b_3 = 2b_2$. If not, no solution. If yes, back-substitute: $x_3 = b_2$, $x_2 = t$ (free), $x_1 = b_1 - t - b_2$. The full solution is:
$$\mathbf{x} = \begin{pmatrix} b_1 - b_2 \\ 0 \\ b_2 \end{pmatrix} + t \begin{pmatrix} -1 \\ 1 \\ 0 \end{pmatrix}, \qquad t \in \mathbb{R}$$
The key procedure point: when looking for a pivot in column $k$ and every entry from the current row downward is zero, declare column $k$’s variable free and advance the column counter without advancing the row counter. The pivot “skips” the dead column and latches onto the next column that has something to work with.
Three Possible Outcomes, Summarized
| What you see after elimination | What it means |
|---|---|
| $n$ pivots, no zero rows on right | Unique solution |
| A row $(0, \ldots, 0 \mid c)$ with $c \neq 0$ | No solution; $\mathbf{b} \notin \text{col}(A)$ |
| A row $(0, \ldots, 0 \mid 0)$, $k < n$ pivots | Infinitely many solutions; $n - k$ free variables |
The number of pivots is the rank of $A$. The number of free variables is $n - \text{rank}(A)$ = the nullity of $A$. Rank plus nullity equals $n$: rank-nullity, confirmed by elimination.
Going Further: Reduced Row Echelon Form
Forward elimination creates zeros below each pivot. You can go further: also create zeros above each pivot, and scale each pivot to 1. The result is called reduced row echelon form (RREF).
Starting from the triangular form of our first example:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & 2 & 4 \\ 0 & 0 & 3 & 3 \end{array}\right)$$
Scale row 3 to make the pivot 1. Divide by 3:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & 2 & 4 \\ 0 & 0 & 1 & 1 \end{array}\right)$$
Eliminate above pivot 3. The entry $(2,3)$ is 2. Subtract $2 \times$ row 3 from row 2:
$$\left(\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 1 \end{array}\right)$$
Eliminate above pivot 2. The entry $(1,2)$ is 2. Subtract $2 \times$ row 2 from row 1:
$$\left(\begin{array}{ccc|c} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 1 \end{array}\right)$$
The right-hand column reads off the solution directly: $x_1 = 1$, $x_2 = 2$, $x_3 = 1$. No back substitution needed.
For systems with free variables, RREF makes the null space explicit. The columns without pivots are free variables; each pivot variable is expressed directly as a linear combination of free variables plus a constant.
Computing the Matrix Inverse
Gaussian elimination does not just solve $A\mathbf{x} = \mathbf{b}$ for one $\mathbf{b}$. With a small extension, it computes $A^{-1}$ directly.
The idea, from first principles. Notice what RREF does to $[A \mid \mathbf{b}]$: it transforms the left block into $I$ and the right block into the solution $\mathbf{x}$. The left side went from $A$ to $I$; the right side went from $\mathbf{b}$ to the answer. This is not a coincidence - whatever row operations turned the left block into identity had to be applied equally to the right block, and the right block absorbed exactly the effect of $A^{-1}$ being applied to $\mathbf{b}$.
Now run the same process but put $I$ on the right instead of a single $\mathbf{b}$:
$$[A \mid I] \xrightarrow{\text{RREF}} [I \mid ,?]$$
The left block becomes $I$ by the same elimination. The right block, starting as $I$, absorbs the same operations - and ends up as $A^{-1}$, for exactly the same reason: whatever turned $A$ into $I$ is, by definition, $A^{-1}$, and applying it to $I$ gives $A^{-1} \cdot I = A^{-1}$.
Reading it column by column makes this concrete. The $j$-th column of $I$ is $\mathbf{e}_j$. So the $j$-th column of $[A \mid I]$ on the right is just $\mathbf{e}_j$ - a single right-hand side. After RREF, that column becomes the solution $\mathbf{p}_j$ to $A\mathbf{p}_j = \mathbf{e}_j$. And $A^{-1}$ is precisely the matrix whose $j$-th column is the solution to $A\mathbf{p}_j = \mathbf{e}_j$. So the right block after reduction is $A^{-1}$, column by column.
The matrix $A^{-1}$ satisfies $A \cdot A^{-1} = I$. We can solve all $n$ columns simultaneously by augmenting $A$ with the full identity matrix and reducing to RREF:
$$[A \mid I] \xrightarrow{\text{RREF}} [I \mid A^{-1}].$$
Why this works (algebraically). Each row operation is left-multiplication by an invertible elementary matrix $E_k$. If the sequence of operations reduces $A$ to $I$: $E_r \cdots E_1 A = I$, then $E_r \cdots E_1 = A^{-1}$. Applying those same operations to the right block: $E_r \cdots E_1 \cdot I = A^{-1}$. The right block becomes $A^{-1}$.
Worked Example
Let $A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 2 & 1 \\ 1 & 1 & 1 \end{pmatrix}$. Form the augmented matrix:
$$\left(\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 2 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 \end{array}\right)$$
$R_3 \leftarrow R_3 - R_1$ (zero out position $(3,1)$):
$$\left(\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 2 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & -1 & 0 & 1 \end{array}\right)$$
$R_2 \leftrightarrow R_3$ (bring the simpler pivot into position):
$$\left(\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 2 & 1 & 0 & 1 & 0 \end{array}\right)$$
$R_3 \leftarrow R_3 - 2R_2$:
$$\left(\begin{array}{ccc|ccc} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 0 & 1 & 2 & 1 & -2 \end{array}\right)$$
$R_1 \leftarrow R_1 - R_3$ (eliminate above the third pivot):
$$\left(\begin{array}{ccc|ccc} 1 & 0 & 0 & -1 & -1 & 2 \\ 0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 0 & 1 & 2 & 1 & -2 \end{array}\right)$$
The left block is $I$, so the right block is:
$$A^{-1} = \begin{pmatrix} -1 & -1 & 2 \\ -1 & 0 & 1 \\ 2 & 1 & -2 \end{pmatrix}.$$
Verify: $A \cdot (-1, -1, 2)^T = ((-1+0+2), (0-2+2), (-1-1+2))^T = (1, 0, 0)^T = \mathbf{e}_1$. ✓
When No Inverse Exists
If $A$ is singular, forward elimination will produce a zero row in the left block while the corresponding row in the right block is nonzero. That row represents the equation $0 = c$ for some $c \neq 0$: the sub-system $A\mathbf{x} = \mathbf{e}_j$ has no solution, and no $j$-th column of $A^{-1}$ can be constructed.
Example: $A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{pmatrix}$ (each row is an arithmetic progression with common difference 3; rank 2).
$$\left(\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 4 & 5 & 6 & 0 & 1 & 0 \\ 7 & 8 & 9 & 0 & 0 & 1 \end{array}\right)$$
$R_2 \leftarrow R_2 - 4R_1$, $R_3 \leftarrow R_3 - 7R_1$:
$$\left(\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 0 & -3 & -6 & -4 & 1 & 0 \\ 0 & -6 & -12 & -7 & 0 & 1 \end{array}\right)$$
$R_3 \leftarrow R_3 - 2R_2$:
$$\left(\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 0 & -3 & -6 & -4 & 1 & 0 \\ 0 & 0 & 0 & 1 & -2 & 1 \end{array}\right)$$
Row 3’s left block is all zeros - no third pivot. But row 3’s right block is $(1, -2, 1) \neq \mathbf{0}$. For each of the three sub-systems this row says $0 = 1$, $0 = -2$, $0 = 1$ respectively - each a contradiction. None of the three columns of $A^{-1}$ can be solved for; $A^{-1}$ does not exist.
The rule is clean: if elimination on $[A \mid I]$ ever reaches a column where the entire sub-column at and below the pivot position is zero - so no row swap can rescue it - then $A$ is singular and has no inverse. A zero at the pivot position alone is not enough; that might just require a row swap. Singularity is confirmed only when there is no nonzero entry anywhere below (or at) that pivot position, leaving that column without a pivot no matter what.
What Elimination Actually Does: $A = LU$
The Intuition
Every time Gaussian elimination subtracts a multiple of one row from another, it is doing something reversible: it is recording exactly how to undo that step. The central insight of $LU$ factorization is that these “undo instructions” - the multipliers - assemble into a lower triangular matrix $L$, and the result of elimination is the upper triangular matrix $U$. Together: $A = LU$.
Why is this useful? Because solving $A\mathbf{x} = \mathbf{b}$ requires only forward elimination and back substitution - $O(n^3)$ work. But once you have $L$ and $U$, you can solve for any new $\mathbf{b}$ in $O(n^2)$: no elimination needed again. For a problem with hundreds of right-hand sides and the same $A$, this is a massive saving. It is also the reason $LU$ factorization is what numerical libraries (NumPy, LAPACK, MATLAB) actually store when you “solve a linear system.”
Where $L$ Comes From
Each elimination step subtracts $m_{ij}$ times row $i$ from row $j$ to zero out position $(j, i)$. The multiplier $m_{ji}$ is just the ratio:
$$m_{ji} = \frac{a_{ji}}{a_{ii}}$$
where $a_{ii}$ is the current pivot. After using row $i$ to clear column $i$ in all rows below, that multiplier is set aside. At the end of all elimination, build a matrix $L$ by placing each multiplier $m_{ji}$ at position $(j, i)$ below the diagonal, and putting 1s on the diagonal:
$$L_{ji} = m_{ji} \quad (j > i), \qquad L_{ii} = 1, \qquad L_{ji} = 0 \quad (j < i).$$
The claim is that this $L$, multiplied by the triangular result $U$, exactly reconstructs $A$. The rest of this section builds this up step by step.
Step-by-Step: Tracking One Elimination as a Matrix Operation
From the earlier section on elementary matrices: subtracting $m$ times row $i$ from row $j$ is left-multiplication by:
$$E_{ji}(m) = I \text{ with } -m \text{ at position } (j,i).$$
The inverse of this operation (add $m$ times row $i$ back to row $j$) is:
$$E_{ji}(m)^{-1} = I \text{ with } +m \text{ at position } (j,i).$$
Applying a sequence of such operations reduces $A$ to $U$:
$$E_k \cdots E_2 E_1 , A = U.$$
Multiply both sides on the left by $E_1^{-1}, E_2^{-1}, \ldots, E_k^{-1}$ in order:
$$A = E_1^{-1} E_2^{-1} \cdots E_k^{-1} , U.$$
Define $L = E_1^{-1} E_2^{-1} \cdots E_k^{-1}$. The key fact: when there are no row swaps and the operations are applied left to right column by column, the inverses multiply together cleanly - each $+m_{ji}$ just drops into position $(j,i)$ of $L$ without disturbing any other entry. The product of all the inverses is exactly the lower triangular matrix with the multipliers below the diagonal and 1s on the diagonal.
Worked Example in Full Detail
We eliminate $A$ from earlier:
$$A = \begin{pmatrix} 1 & 2 & 0 \\ 2 & 5 & 2 \\ 1 & 5 & 9 \end{pmatrix}$$
Step 1: Zero out column 1, rows 2 and 3.
Multiplier for row 2: $m_{21} = 2/1 = 2$. Operation: $R_2 \leftarrow R_2 - 2 R_1$.
Multiplier for row 3: $m_{31} = 1/1 = 1$. Operation: $R_3 \leftarrow R_3 - 1 R_1$.
As elementary matrices:
$$E_{21} = \begin{pmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \qquad E_{31} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}$$
After both operations, the matrix becomes:
$$\begin{pmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ 0 & 3 & 9 \end{pmatrix}$$
Record: $m_{21} = 2$ goes to position $(2,1)$ of $L$. $m_{31} = 1$ goes to position $(3,1)$ of $L$.
Step 2: Zero out column 2, row 3.
The current pivot for column 2 is the $(2,2)$ entry, which is 1.
Multiplier for row 3: $m_{32} = 3/1 = 3$. Operation: $R_3 \leftarrow R_3 - 3 R_2$.
As an elementary matrix:
$$E_{32} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3 & 1 \end{pmatrix}$$
After this operation:
$$U = \begin{pmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ 0 & 0 & 3 \end{pmatrix}$$
Record: $m_{32} = 3$ goes to position $(3,2)$ of $L$.
Assembling $L$.
Place all three multipliers below the diagonal, 1s on the diagonal, 0s above:
$$L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 3 & 1 \end{pmatrix}$$
The 2 is at $(2,1)$ because it was the multiplier used to clear row 2 using row 1. The 1 is at $(3,1)$ for the same reason on row 3. The 3 is at $(3,2)$ because it cleared row 3 using row 2.
Verifying $A = LU$.
$$LU = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 3 & 1 \end{pmatrix} \begin{pmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ 0 & 0 & 3 \end{pmatrix}$$
Row 1 of $LU$: $(1)(1,2,0) + (0)(\cdots) + (0)(\cdots) = (1, 2, 0)$. Matches row 1 of $A$.
Row 2 of $LU$: $(2)(1,2,0) + (1)(0,1,2) + (0)(\cdots) = (2,4,0) + (0,1,2) = (2, 5, 2)$. Matches row 2 of $A$.
Row 3 of $LU$: $(1)(1,2,0) + (3)(0,1,2) + (1)(0,0,3) = (1,2,0) + (0,3,6) + (0,0,3) = (1, 5, 9)$. Matches row 3 of $A$.
$$LU = \begin{pmatrix} 1 & 2 & 0 \\ 2 & 5 & 2 \\ 1 & 5 & 9 \end{pmatrix} = A \checkmark$$
Why does this work out so cleanly? The row 2 of $L$ says: row 2 of $A$ is $2 \times$ (row 1 of $U$) $+ 1 \times$ (row 2 of $U$). That is exactly what we did during elimination - row 2 of $A$ was row 1 scaled by 2 (the part we subtracted) plus the result row 2 of $U$ (what remained). $L$ encodes the “recipe” for reconstructing each row of $A$ from the rows of $U$, using the multipliers as the coefficients.
Using $LU$ to Solve Multiple Right-Hand Sides
Given $A = LU$, the system $A\mathbf{x} = \mathbf{b}$ becomes $LU\mathbf{x} = \mathbf{b}$. Let $\mathbf{y} = U\mathbf{x}$. Then:
- Forward substitution: Solve $L\mathbf{y} = \mathbf{b}$. Since $L$ is lower triangular, this goes top to bottom: $y_1$ is immediate from row 1, then $y_2$ from row 2 using $y_1$, and so on.
- Back substitution: Solve $U\mathbf{x} = \mathbf{y}$. Since $U$ is upper triangular, go bottom to top.
Each stage costs $O(n^2)$. Factoring $A$ into $LU$ costs $O(n^3)$ once. Every subsequent solve for a new $\mathbf{b}$ costs only $O(n^2)$.
Worked forward substitution for $\mathbf{b} = (5, 14, 20)^T$:
$$L\mathbf{y} = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 3 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix} = \begin{pmatrix} 5 \\ 14 \\ 20 \end{pmatrix}$$
- Row 1: $y_1 = 5$.
- Row 2: $2(5) + y_2 = 14 \Rightarrow y_2 = 4$.
- Row 3: $1(5) + 3(4) + y_3 = 20 \Rightarrow y_3 = 3$.
So $\mathbf{y} = (5, 4, 3)^T$. Notice: these are exactly the right-hand side values that appeared during forward elimination earlier. That is not a coincidence - forward substitution through $L$ is just replaying the elimination steps on $\mathbf{b}$.
Worked back substitution for $U\mathbf{x} = (5, 4, 3)^T$:
$$U\mathbf{x} = \begin{pmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ 0 & 0 & 3 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 5 \\ 4 \\ 3 \end{pmatrix}$$
- Row 3: $3 x_3 = 3 \Rightarrow x_3 = 1$.
- Row 2: $x_2 + 2(1) = 4 \Rightarrow x_2 = 2$.
- Row 1: $x_1 + 2(2) + 0(1) = 5 \Rightarrow x_1 = 1$.
Solution: $(x_1, x_2, x_3) = (1, 2, 1)$, matching what direct elimination gave. The $LU$ factorization stores the complete elimination process so it never needs to be repeated.
The Connection Back to Vector Spaces
Gaussian elimination is not just a solving algorithm. It reveals the complete structure of a linear map.
Rank from pivots. The number of pivots equals the rank of $A$: the dimension of the column space. Pivot columns of $A$ (the original columns, not the eliminated version) form a basis for the column space.
Nullity from free variables. The number of free variables equals the nullity: the dimension of the null space. RREF makes null space vectors explicit - set one free variable to 1 and all others to 0, solve for pivot variables.
Why does this equal the dimension? The intuition: free variables are genuinely independent degrees of freedom. Each one can be set to any value without affecting the others, and once the free variables are chosen, the pivot variables are forced (RREF expresses each pivot variable as a fixed linear combination of the free ones). So the null space has exactly as many independent directions as there are free variables.
The proof constructs a basis for the null space explicitly. Suppose there are $k$ free variables and $r = n - k$ pivot variables. Label the free variables $f_1, \ldots, f_k$. For each $j = 1, \ldots, k$, define a vector $\mathbf{v}_j$ by setting free variable $f_j$ to 1, all other free variables to 0, and solving for the pivot variables from the RREF with zero right-hand side. This gives a unique solution because RREF expresses each pivot variable as a linear combination of the free variables.
Each $\mathbf{v}_j$ is in the null space. It was constructed by solving $A\mathbf{x} = \mathbf{0}$, so $A\mathbf{v}_j = \mathbf{0}$ by construction.
The $\mathbf{v}_j$ are linearly independent. Suppose $c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k = \mathbf{0}$. Look at the component for free variable $f_j$: in $\mathbf{v}_j$ it equals 1, in every other $\mathbf{v}_i$ with $i \neq j$ it equals 0 (we set exactly one free variable to 1 per basis vector). So the $f_j$ component of the linear combination is $c_j \cdot 1 = c_j$. For the sum to be $\mathbf{0}$, we need $c_j = 0$ for every $j$. The vectors are independent.
The $\mathbf{v}_j$ span the null space. Let $\mathbf{x}$ be any null space vector, and let $\alpha_j$ denote its value at free variable $f_j$. Form $\mathbf{y} = \mathbf{x} - \alpha_1 \mathbf{v}_1 - \cdots - \alpha_k \mathbf{v}_k$. This is in the null space. The value of $\mathbf{y}$ at free variable $f_j$ is $\alpha_j - \alpha_j \cdot 1 = 0$, so all free variables of $\mathbf{y}$ are zero. But RREF pins each pivot variable as a linear combination of the free variables: if all free variables are zero, all pivot variables are zero too. So $\mathbf{y} = \mathbf{0}$, meaning $\mathbf{x} = \alpha_1 \mathbf{v}_1 + \cdots + \alpha_k \mathbf{v}_k$.
Since ${\mathbf{v}_1, \ldots, \mathbf{v}_k}$ is linearly independent and spans the null space, it is a basis. The dimension of the null space is exactly $k$, the number of free variables.
Rank-nullity. Every column is either a pivot column or a free-variable column. So rank plus nullity equals the total number of columns: $n$. Gaussian elimination proves rank-nullity constructively - it literally partitions the columns into two groups.
Invertibility. $A$ is invertible if and only if every column has a pivot - i.e., rank equals $n$. In that case there are no free variables, the null space is $\{\mathbf{0}\}$, and every $\mathbf{b}$ has a unique solution. The $LU$ factorization has no zero pivots.
The Rigorous Underpinning
Elementary Matrices: What They Are
An elementary matrix is the identity matrix $I$ with exactly one modification - one column changed to encode a single row operation. There are three types, one per row operation.
The Column Interpretation: Why Left-Multiplication Performs a Row Operation
Before defining the three types, we need the right lens to see why left-multiplication works. The key is the column interpretation of matrix multiplication.
When you multiply a matrix $E$ by a vector $\mathbf{b} = (b_1, b_2, \ldots, b_n)^T$, the result is a linear combination of the columns of $E$, with the entries of $\mathbf{b}$ as the weights:
$$E\mathbf{b} = b_1 \mathbf{c}_1 + b_2 \mathbf{c}_2 + \cdots + b_n \mathbf{c}_n$$
where $\mathbf{c}_k$ denotes column $k$ of $E$.
Now, the product $EA$ is just $E$ applied to each column of $A$ separately. Column $k$ of $EA$ = $E \times$ (column $k$ of $A$). So the formula above governs every column of the result simultaneously.
Why the identity matrix does nothing. The identity matrix has $\mathbf{c}_k = \mathbf{e}_k$ (the $k$-th standard basis vector: 1 at position $k$, zeros elsewhere). So $I\mathbf{b} = b_1 \mathbf{e}_1 + \cdots + b_n \mathbf{e}_n$. Each term $b_k \mathbf{e}_k$ puts $b_k$ at position $k$ and zeros everywhere else. Summing: position $k$ gets exactly $b_k$. The vector is unchanged.
What changes when we modify one column. An elementary matrix is $I$ with exactly one column altered. All terms in the sum $b_1 \mathbf{c}_1 + \cdots + b_n \mathbf{c}_n$ behave exactly like the identity - except the one term involving the modified column. That single term carries all the action.
Type 1: Row addition (subtract $m$ times row $j$ from row $i$, i.e., $R_i \leftarrow R_i - m R_j$).
Start with $I$ and replace column $j$ with $\mathbf{e}_j - m\mathbf{e}_i$: keep the 1 at position $j$, but add $-m$ at position $i$. For a $3 \times 3$ matrix, subtracting $m$ times row 2 from row 1 ($R_1 \leftarrow R_1 - m R_2$) gives:
$$E = \begin{pmatrix} 1 & -m & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}$$
Column 1 = $\mathbf{e}_1$, column 2 = $\mathbf{e}_2 - m\mathbf{e}_1$, column 3 = $\mathbf{e}_3$.
Why does left-multiplication by this perform the row operation? Take any column of $A$ and call its entries $b_1, b_2, b_3$. Compute $E\mathbf{b}$ using the column interpretation:
$$E\mathbf{b} = b_1 \mathbf{e}_1 + b_2 (\mathbf{e}_2 - m\mathbf{e}_1) + b_3 \mathbf{e}_3$$
Expand and collect by position:
- The $b_1 \mathbf{e}_1$ term puts $b_1$ at position 1 and zeros elsewhere.
- The $b_2(\mathbf{e}_2 - m\mathbf{e}_1)$ term puts $b_2$ at position 2, but also $-m b_2$ at position 1.
- The $b_3 \mathbf{e}_3$ term puts $b_3$ at position 3 and zeros elsewhere.
Sum everything up position by position:
- Position 1: $b_1 + (-m b_2) = b_1 - m b_2$. This is the modified row.
- Position 2: $b_2$. Unchanged.
- Position 3: $b_3$. Unchanged.
Position 1 became $b_1 - m b_2$ - which is exactly entry 1 of (row 1) $- m \times$ (row 2). The $-m b_2$ came entirely from the modified column: the only term that places anything at position 1 beyond $b_1$ is $b_2 \cdot (\mathbf{e}_2 - m\mathbf{e}_1)$, because that column has a $-m$ at position 1.
Since this holds for every column of $A$ (with $b_1, b_2, b_3$ being the entries of whichever column we pick), every column of $EA$ has its first entry equal to (original first entry) $- m \times$ (original second entry). That is precisely the row operation $R_1 \leftarrow R_1 - m R_2$ applied to all columns at once - which is the same as applying it to the matrix $A$.
Inverse: replace $-m$ with $+m$ in column $j$. The modified column becomes $\mathbf{e}_j + m\mathbf{e}_i$, which undoes the subtraction by adding back $m$ times row $j$:
$$E^{-1} = \begin{pmatrix} 1 & m & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}$$
The inverse is obtained by flipping the sign of the single off-diagonal entry. No matrix inversion needed.
Type 2: Row swap (exchange rows $i$ and $j$).
Start with $I$ and swap columns $i$ and $j$. For swapping rows 2 and 3:
$$P_{23} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}$$
Column 1 = $\mathbf{e}_1$, column 2 = $\mathbf{e}_3$, column 3 = $\mathbf{e}_2$.
For any column of $A$ with entries $(b_1, b_2, b_3)$:
$$P_{23}\mathbf{b} = b_1\mathbf{e}_1 + b_2\mathbf{e}_3 + b_3\mathbf{e}_2$$
- Position 1 gets $b_1$. Unchanged.
- Position 2 gets $b_3$ (came from $b_3 \mathbf{e}_2$, whose component at position 2 is 1). This is the old row 3.
- Position 3 gets $b_2$ (came from $b_2 \mathbf{e}_3$, whose component at position 3 is 1). This is the old row 2.
Rows 2 and 3 swapped. Every column of $A$ gets the same treatment, so the entire matrix has rows 2 and 3 exchanged.
Inverse: swap again. $P_{23}^{-1} = P_{23}$ - swapping twice returns to the original. Every row-swap matrix is its own inverse.
Type 3: Row scaling (multiply row $i$ by a nonzero scalar $c$).
Start with $I$ and replace column $i$’s entry at position $i$ (the diagonal) with $c$. Scaling row 2 by $c$:
$$D_2(c) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{pmatrix}$$
Column 2 = $c\mathbf{e}_2$ (scaled standard basis vector).
For any column of $A$ with entries $(b_1, b_2, b_3)$:
$$D_2(c)\mathbf{b} = b_1\mathbf{e}_1 + b_2 (c\mathbf{e}_2) + b_3\mathbf{e}_3$$
Position 2 gets $c b_2$; everything else is unchanged. Row 2 of every column is scaled by $c$, so row 2 of $A$ is scaled by $c$.
Inverse: scale by $1/c$. The condition $c \neq 0$ (required for this operation to be legal) is exactly the condition needed for the inverse to exist.
Stringing Operations Together: $A = LU$
Gaussian elimination applies a sequence of Type-1 operations $E_1, E_2, \ldots, E_k$ (plus possible swaps handled separately) to reduce $A$ to upper triangular $U$:
$$E_k \cdots E_2 E_1 A = U.$$
Each $E_i$ is invertible, so:
$$A = E_1^{-1} E_2^{-1} \cdots E_k^{-1} U.$$
Define $L = E_1^{-1} E_2^{-1} \cdots E_k^{-1}$. Each $E_i^{-1}$ is a Type-1 elementary matrix with the sign of the off-diagonal entry flipped. Multiplying lower-triangular matrices (each with 1s on the diagonal and a single off-diagonal entry) produces a lower-triangular matrix. The multiplier $m_{ij}$ used to zero out position $(i,j)$ ends up sitting at position $(i,j)$ of $L$, with its sign restored. This is why $L$’s below-diagonal entries are exactly the multipliers recorded during elimination - no additional computation needed.
Why operations preserve solutions (formal). $A\mathbf{x} = \mathbf{b}$ and $EA\mathbf{x} = E\mathbf{b}$ have identical solution sets. If $\mathbf{x}$ satisfies $A\mathbf{x} = \mathbf{b}$, multiply both sides by $E$: $EA\mathbf{x} = E\mathbf{b}$. Conversely, if $\mathbf{x}$ satisfies $EA\mathbf{x} = E\mathbf{b}$, multiply both sides by $E^{-1}$: $A\mathbf{x} = E^{-1}E\mathbf{b} = \mathbf{b}$. Since $E$ is invertible, both directions hold.
Pivoting and $PA = LU$. When row swaps are needed, their swap matrices (Type-2 operations) accumulate into a single permutation matrix $P$. The factorization becomes $PA = LU$: first permute the rows of $A$ into the right order, then the remaining elimination runs without any swaps. Every invertible matrix has a $PA = LU$ factorization.
Uniqueness of RREF. Every matrix has exactly one RREF. No matter what sequence of row operations you apply, if you reduce to RREF you always get the same result. This makes RREF a canonical form: two matrices have the same RREF if and only if they have the same row space.
Summary
| Step | What happens |
|---|---|
| Write augmented $[A \mid \mathbf{b}]$ | Encode both $A$ and $\mathbf{b}$ in one matrix |
| Forward elimination | Create zeros below each pivot; record multipliers |
| Row swap if pivot is zero | Find the largest entry in the column below; swap |
| No possible pivot | Free variable (if RHS is 0) or no solution (if RHS is nonzero) |
| Back substitution | Solve triangular system from bottom to top |
| RREF (optional) | Eliminate above pivots too; read solution directly |
| $A = LU$ | Multipliers form $L$; triangular result is $U$ |
| Count pivots | Rank = dim(col space); $n$ - rank = nullity = dim(null space) |
The algorithm is mechanical: for each column left to right, find a pivot, eliminate below it, record the multiplier. Repeat. The final triangular system tells you everything: whether a solution exists, how many there are, and what they look like.
Read next: