Helpful context:


Here is a question that should bother you once you have spent some time with Gaussian elimination.

You have a matrix $A$. You want to know whether the system $Ax = b$ has a unique solution. The answer - we established this carefully - depends on whether the columns of $A$ are linearly independent. If every column has a pivot after row reduction, you are fine. If any column is free, you have a problem.

But row reduction is work. You have to go through the whole elimination process to find out. Is there some quicker check? Is there a single number you could compute from $A$ - in advance, without full elimination - that captures whether the matrix is invertible?

Better: could that single number tell you not just whether $A$ is invertible, but how invertible it is? How much does the transformation $T_A$ stretch or compress space? Is the transformation orientation-preserving, or does it produce a mirror image?

That number is the determinant. But to understand why it is the right number - why it is not arbitrary, why it encodes exactly these things - you need to start with geometry, not algebra.


Section 1: The 2x2 Case and What It Really Means

Take a 2x2 matrix:

$$A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$$

The two columns are the vectors $\mathbf{v_1} = (a, c)$ and $\mathbf{v_2} = (b, d)$. Draw them as arrows starting from the origin. They carve out a parallelogram - a four-sided figure with $\mathbf{v_1}$ and $\mathbf{v_2}$ as adjacent sides.

The determinant of $A$ is the signed area of that parallelogram:

$$\det(A) = ad - bc$$

Before you accept this as a formula to memorize, let us verify it makes sense.

The identity matrix. The columns are $e_1 = (1, 0)$ and $e_2 = (0, 1)$. They form the unit square. Area = 1. And $\det(I) = (1)(1) - (0)(0) = 1$. Correct.

One column is zero. If $\mathbf{v_1} = (0, 0)$, the parallelogram collapses to a line segment (or a point). Area = 0. And $\det\begin{pmatrix} 0 & b \\ 0 & d \end{pmatrix} = 0 \cdot d - b \cdot 0 = 0$. Correct. This matrix is singular: both rows have zero in the first entry, so the first column contains no independent information.

Proportional columns. If $\mathbf{v_2} = k\mathbf{v_1}$ for some scalar $k$, the two arrows point in the same direction. The parallelogram is completely flat: zero area. Check: $A = \begin{pmatrix} a & ka \\ c & kc \end{pmatrix}$ gives $\det(A) = a(kc) - (ka)(c) = kac - kac = 0$. Correct. Proportional columns means linear dependence, which means singular, which means zero determinant.

A concrete example. Take $A = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}$.

The columns are $\mathbf{v_1} = (2, 0)$ and $\mathbf{v_2} = (1, 3)$. The column $(2, 0)$ lies along the positive $x$-axis with length 2. The column $(1, 3)$ tips to the right and up.

To find the area of the parallelogram geometrically: the base is $|\mathbf{v_1}| = 2$. The height is the perpendicular distance from $\mathbf{v_2}$ to the line through $\mathbf{v_1}$ - since $\mathbf{v_1}$ lies on the $x$-axis, the height is just the $y$-component of $\mathbf{v_2}$, which is 3. Area = base $\times$ height $= 2 \times 3 = 6$.

Formula: $\det(A) = (2)(3) - (1)(0) = 6$. They match.

What about the sign?

The formula gives a signed area, not just a magnitude. When is it positive, and when negative?

Think of it this way: if you walk from $\mathbf{v_1}$ to $\mathbf{v_2}$ and the turn is counterclockwise, the determinant is positive. If the turn is clockwise, it is negative.

More precisely: $\det > 0$ means the pair $(\mathbf{v_1}, \mathbf{v_2})$ has the same orientation as the standard basis $(e_1, e_2)$ - which we call right-handed. $\det < 0$ means the pair has the opposite orientation: the transformation flips the plane, like a mirror reflection.

Example: $\det\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = (0)(0) - (1)(1) = -1$. This matrix swaps the $x$ and $y$ axes - a reflection across the line $y = x$. The area is still 1, but the orientation is reversed.

Discomfort check. Why is area “signed”? In ordinary geometry, area is always positive. The sign here is extra information: it tells you whether the transformation preserves the handedness of the coordinate system or mirrors it. Think of it as the difference between a right-hand glove and a left-hand glove - both have the same area, but they are fundamentally different because one cannot be continuously deformed into the other without passing through zero.


Section 2: What Determinants Measure

The signed-area picture immediately tells us what the determinant is measuring in terms of the linear transformation $T_A: \mathbb{R}^n \to \mathbb{R}^n$.

The fundamental theorem. If $A$ is an $n \times n$ matrix, then the transformation $T_A$ multiplies the $n$-dimensional volume of every region by $|\det(A)|$.

Apply $T_A$ to the unit square (or unit cube, or unit hypercube). It maps to a parallelogram (or parallelepiped, or hyperparallelepiped). The volume of that image shape is exactly $|\det(A)|$.

This tells you everything you need to know about the “size” of the transformation:

$\det(A) = 0$: The transformation collapses space. A region that had positive volume gets mapped to something with zero volume - a plane, a line, a point. Some direction in the domain gets sent to the zero vector. Information is irreversibly lost. The matrix is singular and non-invertible.

$|\det(A)| = 1$: The transformation preserves volume. The shape might change - it can rotate, shear, reflect - but areas or volumes stay the same. Rotation matrices and shear matrices have $|\det| = 1$.

$|\det(A)| > 1$: The transformation expands space. Small regions grow larger.

$|\det(A)| < 1$: The transformation contracts space. Small regions shrink.

Sign of $\det(A)$: $+1$ means orientation is preserved (right-hand stays right-hand); $-1$ means orientation is reversed (right-hand becomes left-hand).

Let us verify these claims for rotations.

A rotation by angle $\theta$ has matrix:

$$R_\theta = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}$$

$$\det(R_\theta) = (\cos\theta)(\cos\theta) - (-\sin\theta)(\sin\theta) = \cos^2\theta + \sin^2\theta = 1$$

Exactly 1, for every angle. Rotations preserve all areas and all orientations. No expansion, no contraction, no flipping.

What about a reflection across the $x$-axis? That has matrix $\begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}$, determinant $= (1)(-1) - 0 = -1$. Area is preserved ($|{-1}| = 1$) but orientation is flipped. A right-handed frame becomes left-handed.

A scaling by 2 in both directions: $\det\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} = 4$. Areas are multiplied by 4, which makes sense: doubling each linear dimension quadruples areas.

Discomfort check. It might feel strange that a single number encodes something as rich as “how does this transformation scale volumes?” But think about it: the transformation is linear, so how it behaves on any parallelepiped is determined by how it behaves on the unit cube (just apply linearity). The volume of the image of the unit cube is $|\det(A)|$. That is the whole story.


Section 3: The 3x3 Case and Cofactor Expansion

In three dimensions, the determinant of a 3x3 matrix $A$ is the signed volume of the parallelepiped formed by the three column vectors. The sign still encodes orientation: positive if the columns form a right-handed frame, negative if left-handed.

Computing it requires a new technique: cofactor expansion.

Minors and Cofactors

For an $n \times n$ matrix $A$, the $(i,j)$ minor $M_{ij}$ is the determinant of the $(n-1) \times (n-1)$ matrix obtained by deleting row $i$ and column $j$.

The $(i,j)$ cofactor is:

$$C_{ij} = (-1)^{i+j} M_{ij}$$

The factor $(-1)^{i+j}$ gives a checkerboard pattern of signs:

$$\begin{pmatrix} + & - & + \\ - & + & - \\ + & - & + \end{pmatrix}$$

Cofactor Expansion Along a Row

The determinant of $A$ can be computed by expanding along any row. Along row $i$:

$$\det(A) = \sum_{j=1}^n a_{ij} C_{ij} = \sum_{j=1}^n (-1)^{i+j} a_{ij} M_{ij}$$

Along column $j$:

$$\det(A) = \sum_{i=1}^n a_{ij} C_{ij}$$

The result is the same regardless of which row or column you choose - a non-obvious fact that follows from the Leibniz formula (more on that in Section 8).

Tip: Expand along the row or column with the most zeros. Each zero eliminates a sub-determinant from the computation.

Worked Example

Let us compute $\det\begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 10 \end{pmatrix}$ by expanding along row 1.

$$\det(A) = 1 \cdot C_{11} + 2 \cdot C_{12} + 3 \cdot C_{13}$$

$$C_{11} = (+1)\det\begin{pmatrix} 5 & 6 \\ 8 & 10 \end{pmatrix} = (5)(10) - (6)(8) = 50 - 48 = 2$$

$$C_{12} = (-1)\det\begin{pmatrix} 4 & 6 \\ 7 & 10 \end{pmatrix} = -(40 - 42) = -(-2) = 2$$

$$C_{13} = (+1)\det\begin{pmatrix} 4 & 5 \\ 7 & 8 \end{pmatrix} = (32 - 35) = -3$$

$$\det(A) = 1(2) + 2(2) + 3(-3) = 2 + 4 - 9 = -3$$

The determinant is $-3$: the transformation contracts volumes by a factor of 3 and reverses orientation.

Discomfort check. Why does cofactor expansion work? The deep reason is that the determinant is alternating and multilinear (properties we will formalize in Section 8). Expanding along a row is a consequence of multilinearity: write each entry in the chosen row as $a_{ij}$ times a basis vector, and the alternating property handles the signs. The fact that any row or column gives the same answer follows from $\det(A^T) = \det(A)$ (proved below).

Sarrus' Rule for 3x3 (Mnemonic Only)

A fast mnemonic for 3x3: write out the matrix, append the first two columns again to the right, then sum the three “falling” diagonals and subtract the three “rising” diagonals.

| a  b  c |              a  b  c  a  b
| d  e  f |  =>    (+)   \  \  \          aei + bfg + cdh
| g  h  i |              d  e  f  d  e
                         g  h  i  g  h
                   (-)   /  /  /          - ceg - afh - bdi

This is only valid for 3x3 matrices. Never apply it to 4x4 or larger.


Section 4: Key Properties

These six properties are the working toolkit of determinant computation. Learn them well - together they let you compute determinants efficiently and prove things cleanly.

Property 1: Multiplicativity

$$\det(AB) = \det(A)\det(B)$$

Geometric intuition. If $T_A$ scales volumes by $|\det(A)|$ and $T_B$ scales volumes by $|\det(B)|$, then the composition $T_A \circ T_B$ scales volumes by both factors in sequence, giving $|\det(A)| \cdot |\det(B)| = |\det(AB)|$.

Proof sketch. Fix $A$ and define $f(B) = \det(AB)$. Check that $f$ is alternating and multilinear in the columns of $B$ and satisfies $f(I) = \det(A)$. Then $f(B) = \det(A) \cdot \det(B)$ by the uniqueness characterization of the determinant (Section 8).

Property 2: Transpose

$$\det(A^T) = \det(A)$$

This is genuinely surprising: you might expect rows and columns to behave differently, since the geometric picture (parallelogram from columns) does not obviously treat them symmetrically. But the determinant does not care.

Proof. In the Leibniz formula $\det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_i a_{i,\sigma(i)}$, substitute $\tau = \sigma^{-1}$. Since $\text{sgn}(\sigma^{-1}) = \text{sgn}(\sigma)$ and the product $\prod_i a_{\tau(i), i}$ equals $\prod_i (A^T)_{i, \tau(i)}$, we get $\det(A^T) = \det(A)$.

Practical consequence. Every property stated for columns holds equally for rows. Expanding along a row is the same as expanding along a column of the transpose.

Property 3: Inverse

If $A$ is invertible:

$$\det(A^{-1}) = \frac{1}{\det(A)}$$

Proof. $A A^{-1} = I$, so $\det(A)\det(A^{-1}) = \det(I) = 1$.

This makes sense geometrically: if $T_A$ expands volume by factor $k$, then $T_{A^{-1}}$ must contract by factor $1/k$ to undo it.

Property 4: Row Operations

This is the most practically useful property for computation.

(a) Swap two rows: $\det$ changes sign.

(b) Scale a row by $c$: $\det$ is multiplied by $c$.

(c) Add $c$ times one row to another: $\det$ is unchanged.

Why (c) is true. This follows from multilinearity: adding a multiple of row $i$ to row $j$ changes row $j$ from $r_j$ to $r_j + c \cdot r_i$. By linearity in that row, $\det(\ldots, r_j + c r_i, \ldots) = \det(\ldots, r_j, \ldots) + c \cdot \det(\ldots, r_i, \ldots)$. The second term has row $i$ appearing twice (once in its own position, once in position $j$), so the alternating property forces it to zero.

Property 5: Identity

$$\det(I) = 1$$

The identity transformation preserves every volume exactly.

Property 6: Triangular Matrices

If $A$ is upper triangular (all entries below the diagonal are zero):

$$\det(A) = a_{11} a_{22} \cdots a_{nn}$$

The determinant is the product of the diagonal entries.

Proof. Apply cofactor expansion along column 1. The only nonzero entry in column 1 below row 1 would be in the lower triangle - but all those are zero for an upper triangular matrix. The expansion reduces to $a_{11}$ times the determinant of the $(n-1) \times (n-1)$ upper triangular submatrix. Repeat inductively.

This is why Gaussian elimination is so useful: row reduction produces a triangular matrix, and reading off the diagonal gives the determinant.


Section 5: Computing Determinants via Row Reduction

Cofactor expansion is elegant and recursive but has a fatal flaw: it is $O(n!)$. For $n = 20$, the Leibniz formula has $20! \approx 2.4 \times 10^{18}$ terms. This is completely impractical.

Row reduction is $O(n^3)$ - the same cost as solving $Ax = b$. Here is the procedure:

  1. Apply Gaussian elimination to reach upper triangular form $U$.
  2. Track all row swaps (each one multiplies the determinant by $-1$).
  3. Track all row scalings (each one multiplies the determinant by the scaling factor; but usually we avoid scaling rows and just use row replacement).
  4. The determinant of $U$ is the product of its diagonal entries (the pivots).
  5. Correct for sign: $\det(A) = (-1)^s \cdot \prod_i u_{ii}$, where $s$ is the number of row swaps.

Since row replacement operations (Property 4c) do not change the determinant, they are “free” - we can do as many as we like.

Worked Example (Same Matrix)

Compute $\det\begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 10 \end{pmatrix}$ by row reduction.

Step 1: Eliminate the 4 in position (2,1). Subtract 4 times row 1 from row 2:

$$R_2 \leftarrow R_2 - 4 R_1: \quad \begin{pmatrix} 1 & 2 & 3 \\ 0 & -3 & -6 \\ 7 & 8 & 10 \end{pmatrix}$$

Determinant unchanged.

Step 2: Eliminate the 7 in position (3,1). Subtract 7 times row 1 from row 3:

$$R_3 \leftarrow R_3 - 7 R_1: \quad \begin{pmatrix} 1 & 2 & 3 \\ 0 & -3 & -6 \\ 0 & -6 & -11 \end{pmatrix}$$

Determinant unchanged.

Step 3: Eliminate the $-6$ in position (3,2). Subtract 2 times row 2 from row 3:

$$R_3 \leftarrow R_3 - 2 R_2: \quad \begin{pmatrix} 1 & 2 & 3 \\ 0 & -3 & -6 \\ 0 & 0 & 1 \end{pmatrix}$$

Determinant unchanged.

Step 4: We have upper triangular form. No row swaps were performed ($s = 0$).

$$\det(A) = (-1)^0 \cdot (1)(-3)(1) = -3$$

This matches the cofactor expansion result and took three simple steps instead of six 2x2 determinants.

Discomfort check. You might worry: what if a row operation inadvertently scaled a row? In the standard Gaussian elimination algorithm (the kind you use to solve $Ax = b$), you only use row replacement (add a multiple of one row to another), never scale or swap unless forced. Swaps happen only to move a nonzero pivot into position, and you count them. The only thing that changes the determinant in this process is those swaps.


Section 6: The Characteristic Polynomial and Eigenvalues

Here is a connection that will become central in the next post.

Suppose you want to find scalars $\lambda$ and nonzero vectors $v$ satisfying $Av = \lambda v$. Rearranging: $(A - \lambda I)v = 0$. This has a nonzero solution $v$ if and only if $A - \lambda I$ is singular - that is, if and only if:

$$\det(A - \lambda I) = 0$$

The expression $p(\lambda) = \det(A - \lambda I)$ is a polynomial of degree $n$ in $\lambda$, called the characteristic polynomial of $A$. Its roots are exactly the eigenvalues.

For example, with $A = \begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}$:

$$\det(A - \lambda I) = \det\begin{pmatrix} 3 - \lambda & 1 \\ 0 & 2 - \lambda \end{pmatrix} = (3-\lambda)(2-\lambda) - 0 = \lambda^2 - 5\lambda + 6$$

Setting this to zero: $(\lambda - 2)(\lambda - 3) = 0$, giving eigenvalues $\lambda = 2$ and $\lambda = 3$.

Notice that $\det(A) = 6 = 2 \times 3$ is the product of the eigenvalues, and $\text{tr}(A) = 5 = 2 + 3$ is their sum. These are not coincidences - expanding the characteristic polynomial reveals them as general facts (for details, see the Eigenvalues post).

The determinant does not just help you find eigenvalues; it provides the conceptual link between the matrix and its spectral properties. A matrix is singular if and only if $\lambda = 0$ is an eigenvalue - meaning the transformation crushes some direction to zero.


Section 7: Applications

Cramer’s Rule

For an invertible system $Ax = b$, the $j$-th component of the solution is:

$$x_j = \frac{\det(A_j)}{\det(A)}$$

where $A_j$ is the matrix $A$ with column $j$ replaced by $b$.

Why it is true. This follows from the cofactor matrix formula for the inverse: $A^{-1} = \frac{1}{\det(A)} \text{adj}(A)$, where $\text{adj}(A)$ is the matrix of cofactors transposed. Then $x = A^{-1}b$ gives each component as a ratio of determinants.

Why no one uses it computationally. Computing each $\det(A_j)$ via LU decomposition costs $O(n^3)$, and you need $n$ such computations, giving $O(n^4)$ total. Gaussian elimination solves $Ax = b$ in $O(n^3)$. Cramer’s rule is four times slower by order of magnitude. It has theoretical value - it gives a closed-form expression for the solution - but it is not used in practice for $n > 3$ or 4.

The Jacobian Determinant in Integration

When you change variables in a multiple integral, you need to account for how the change of variables stretches or compresses volume. If $\mathbf{x} = g(\mathbf{u})$ is a smooth, invertible change of variables, then:

$$\int f(\mathbf{x}) d\mathbf{x} = \int f(g(\mathbf{u})) |\det J_g(\mathbf{u})| d\mathbf{u}$$

where $J_g$ is the Jacobian matrix with entries $(J_g)_{ij} = \partial g_i / \partial u_j$.

The Jacobian determinant is the local volume scaling factor of the (generally nonlinear) map $g$. At each point $\mathbf{u}$, the linear approximation to $g$ acts like a linear transformation with matrix $J_g(\mathbf{u})$, and the determinant tells you how much infinitesimal volume elements are scaled.

Example: polar coordinates. The map $g(r, \theta) = (r\cos\theta, r\sin\theta)$ has Jacobian:

$$J_g = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \qquad \det(J_g) = r\cos^2\theta + r\sin^2\theta = r$$

This gives the familiar substitution formula $dx dy = r dr d\theta$.

Connection to ML: normalizing flows. In generative modeling, a normalizing flow learns an invertible transformation $g$ from a simple base distribution to a complex data distribution. To evaluate log-likelihoods, you need the log-determinant of the Jacobian. Architectures like RealNVP and Glow are designed so that this Jacobian determinant is cheap to compute (often $O(n)$ via triangular structure), making training tractable.

In probability: if $X$ has density $p_X$ and $Y = g(X)$, the density of $Y$ is:

$$p_Y(y) = p_X(g^{-1}(y)) |\det J_{g^{-1}}(y)|$$

This density transformation formula is the continuous analogue of counting preimages, weighted by how much $g$ stretches each region.

Orientation in Computer Graphics

In 3D graphics, a mesh is made of triangles. Each triangle has an orientation determined by the order of its three vertices: counterclockwise (front-facing) or clockwise (back-facing). The sign of the determinant of a 2x2 or 3x3 matrix built from the vertex positions determines this orientation.

During rendering, the graphics pipeline performs backface culling: it discards triangles that are facing away from the camera. The test is simple: compute a determinant (or equivalently, the sign of a cross product). If negative, the triangle is back-facing and can be discarded, saving half the geometry processing.

The winding order test is also used in 2D computational geometry: given a polygon with vertices in order, the sign of the sum of cross products determines whether the polygon is specified clockwise or counterclockwise.

Stability of Dynamical Systems

Consider a system of differential equations $\dot{x} = f(x)$ with an equilibrium at $x^{\ast}$ (meaning $f(x^{\ast}) = 0$). To understand whether $x^{\ast}$ is stable, you linearize $f$ around $x^{\ast}$ and analyze the Jacobian matrix $J = Df(x^{\ast})$.

The eigenvalues of $J$ determine stability. The determinant appears in two ways:

  • $\det(J) = 0$ means $x^{\ast}$ is a non-isolated or degenerate equilibrium.
  • $\det(J) > 0$ and $\text{tr}(J) < 0$: both eigenvalues have negative real part (stable node or spiral).
  • $\det(J) < 0$: eigenvalues have opposite signs (saddle point, unstable).

For a 2x2 system, the characteristic polynomial is $\lambda^2 - \text{tr}(J)\lambda + \det(J) = 0$, and the sign of $\det(J)$ immediately tells you whether the eigenvalues are same-sign or opposite-sign.

Log-Determinant in Machine Learning

The multivariate Gaussian density for $x \in \mathbb{R}^n$ is:

$$p(x) = \frac{1}{(2\pi)^{n/2} |\det \Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x - \mu)\right)$$

The log-likelihood involves $\log \det \Sigma$. For $n = 1000$, computing this by the Leibniz formula would take $1000!$ operations. In practice, you compute the Cholesky factorization $\Sigma = LL^T$ (where $L$ is lower triangular with positive diagonal), giving:

$$\log \det \Sigma = \log \det(LL^T) = \log(\det L)^2 = 2 \log \det L = 2 \sum_{i=1}^n \log L_{ii}$$

One sum of $n$ log operations. This is why scipy.linalg.slogdet returns the sign and log-absolute-value separately - for numerical stability on matrices where $\det$ itself would underflow or overflow to 0 or $\infty$ in floating point.


Section 8: The Rigorous Underpinning

Everything above rests on two formulations of the determinant. Here they are properly.

The Leibniz Formula

$$\det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^n a_{i,\sigma(i)}$$

The sum is over all $n!$ permutations $\sigma$ of $\{1, \ldots, n\}$. The sign $\text{sgn}(\sigma)$ is $+1$ if $\sigma$ is an even permutation (expressible as an even number of transpositions) and $-1$ if odd.

For $n = 2$: Two permutations: the identity $\sigma = (1,2)$ with $\text{sgn} = +1$, and the transposition $\sigma = (2,1)$ with $\text{sgn} = -1$. This gives $a_{11}a_{22} - a_{12}a_{21} = ad - bc$.

For $n = 3$: Six permutations: three even (identity, $(123)$, $(132)$ as cycles) and three odd (each pairwise transposition). These give the six terms in Sarrus' rule.

The Leibniz formula grows as $O(n!)$, making it purely theoretical for $n \geq 20$. It is the definition from which everything else is derived - but you would never use it to actually compute a determinant.

The Axiomatic Characterization

The cleanest approach - used in most modern algebra texts - starts not with a formula but with properties.

Theorem. There exists a unique function $d: \mathbb{R}^{n \times n} \to \mathbb{R}$ satisfying:

(1) Multilinearity in rows (or columns). For each fixed row $i$, $d$ is linear in that row:

$$d(\ldots, \alpha r + s, \ldots) = \alpha d(\ldots, r, \ldots) + d(\ldots, s, \ldots)$$

(2) Alternating. Swapping any two rows changes the sign:

$$d(\ldots, r_i, \ldots, r_j, \ldots) = -d(\ldots, r_j, \ldots, r_i, \ldots)$$

(3) Normalization. $d(I) = 1$.

That function is the determinant.

Why these three properties determine it completely. Write each row as a linear combination of standard basis vectors $e_1, \ldots, e_n$. By multilinearity, expand $d$ into $n^n$ terms - each involving $d$ of a matrix whose rows are basis vectors. By the alternating property, any term with a repeated basis row gives zero. The only surviving terms are those where every row is a different basis vector - that is, those indexed by permutations. The alternating property then fixes the sign of each such term as $\text{sgn}(\sigma)$. The normalization $d(I) = 1$ fixes the overall scale. This is exactly the Leibniz formula.

Why the axiomatic approach is powerful. To prove $\det(AB) = \det(A)\det(B)$, you do not need to manipulate the Leibniz formula directly. Instead, fix $A$ and define $f(B) = \det(AB)$. Check that $f$ is alternating and multilinear in the rows of $B$ (it is, since left-multiplication by $A$ is linear and the composition of linear maps is linear). Check $f(I) = \det(A)$. By uniqueness, $f(B) = \det(A) \cdot \det(B)$.

This is far cleaner than expanding $\det(AB)$ from the Leibniz formula, which would require reindexing a double sum over permutations.

Discomfort check. The three axioms might feel like we are defining something abstract and hoping it is useful. But we are not - we are capturing what the signed-area/signed-volume function must be. Condition (1) says the determinant responds linearly to scaling each side of the parallelogram/parallelepiped. Condition (2) says area/volume is antisymmetric (swapping two sides reverses orientation). Condition (3) normalizes by saying the unit cube has volume 1. These are not arbitrary choices - they are forced by the geometry.

Why Alternating Implies Singular Columns Collapse

If two rows are identical - say rows $i$ and $j$ with $r_i = r_j$ - then swapping them does not change the matrix, but by the alternating property, it changes the sign of $d$. So $d = -d$, giving $d = 0$.

More generally: if one row is a linear combination of the others (i.e., the rows are linearly dependent), then by multilinearity you can write $d$ as a sum of terms each having a repeated row, hence each equal to zero.

This is the direct proof that $\det(A) = 0$ whenever $A$ is singular (has linearly dependent rows).


Summary

Concept Geometric meaning Algebraic condition
$\det(A) \neq 0$ Volume $> 0$; $A$ invertible Columns linearly independent
$\det(A) = 0$ Space collapses; singular Columns linearly dependent
$ \det(A) $
$\text{sign}(\det(A))$ Orientation preserved/flipped $+1$ or $-1$
$\det(AB)$ Volume scales multiplicatively $= \det(A)\det(B)$
$\det(A^T)$ Rows behave like columns $= \det(A)$
$\det(A^{-1})$ Inverse scaling $= 1/\det(A)$
Triangular $A$ Axes-aligned scaling $= \prod_i a_{ii}$

The single most important fact about determinants: a matrix is invertible if and only if its determinant is nonzero. Everything else - the geometric volume interpretation, the characteristic polynomial, Cramer’s rule, the Jacobian - flows from and illuminates this central connection between algebra and geometry.



A Note on History

The determinant was not invented in one place. It emerged independently, over nearly two centuries, as mathematicians kept bumping into the same quantity.

Leibniz (1693) was the first to write determinant-like expressions when studying systems of linear equations. Working out conditions for three equations in two unknowns to have a common solution, he wrote down expressions that are recognizably the $3 \times 3$ determinant, though he had no name for them and saw them only as algebraic conditions.

Cramer (1750) published his rule for solving $n$ equations in $n$ unknowns - the formula expressing each variable as a ratio of determinants. This is the first systematic appearance of determinants as a computational tool. Cramer’s rule is today known mainly as a theoretical result (it is too expensive to compute in practice), but in 1750 it was a significant advance.

Vandermonde (1771) was the first to treat the determinant as an independent function in its own right, not just as a side product of equation-solving. The Vandermonde determinant - a specific determinant built from powers of variables that arises in polynomial interpolation - bears his name.

Laplace (1772) proved the expansion theorem: a determinant can be computed by expanding along any row or column. This is the cofactor expansion you use today. He also clarified the alternating sign pattern.

Cauchy (1815) gave the modern treatment. He established the multiplicativity rule $\det(AB) = \det(A)\det(B)$, proved that the determinant is the unique alternating multilinear function of the columns normalized by $\det(I) = 1$, and wrote the Leibniz formula $\det(A) = \sum_\sigma \text{sgn}(\sigma) \prod_i a_{i,\sigma(i)}$ over all permutations.

Jacobi (1841) established the connection to eigenvalues and to the characteristic polynomial, and formalized the relationship between the determinant and the transformation’s volume-scaling behavior.

The geometric interpretation - determinant as signed volume - was clarified gradually through the 19th century as the modern concept of a linear transformation solidified. Mathematicians were computing determinants correctly a century before they could articulate precisely what a “vector space” or “linear map” was.

Read next: