Determinants
Prerequisite:
Axiomatic Definition
Rather than defining the determinant by a formula, the cleanest approach (following Axler and most modern algebra texts) is axiomatic. The determinant $\det : M_{n \times n}(\mathbb{F}) \to \mathbb{F}$ is the unique function of the columns $a_1, \ldots, a_n \in \mathbb{F}^n$ satisfying:
-
Multilinearity: $\det$ is linear in each column separately, holding the others fixed: $$\det(\ldots, \alpha a + b, \ldots) = \alpha\det(\ldots, a, \ldots) + \det(\ldots, b, \ldots)$$
-
Alternating: Swapping any two columns changes the sign: $$\det(\ldots, a_i, \ldots, a_j, \ldots) = -\det(\ldots, a_j, \ldots, a_i, \ldots)$$
-
Normalisation: $\det(I) = 1$.
These three axioms uniquely determine $\det$. The alternating property implies that $\det = 0$ whenever two columns are equal (swap them: the sign changes, but the matrix is the same, so $\det = -\det$, giving $\det = 0$). More generally, the alternating property forces $\det = 0$ whenever the columns are linearly dependent.
The Permutation (Leibniz) Formula
From the axioms, one can derive an explicit formula. Expanding via multilinearity in each column using the standard basis ${e_1, \ldots, e_n}$: $$\det(A) = \det\left(\sum_{i_1} a_{i_1 1} e_{i_1}, \ldots, \sum_{i_n} a_{i_n n} e_{i_n}\right) = \sum_{i_1, \ldots, i_n} a_{i_1 1} \cdots a_{i_n n} \det(e_{i_1}, \ldots, e_{i_n})$$
The term $\det(e_{i_1}, \ldots, e_{i_n})$ is nonzero only when $(i_1, \ldots, i_n)$ is a permutation $\sigma \in S_n$, in which case it equals $\text{sgn}(\sigma)$ (the sign of the permutation, $\pm 1$). This gives the Leibniz formula:
$$\det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^n a_{i,\sigma(i)}$$
For $n = 2$: $\det\begin{pmatrix} a & b \ c & d \end{pmatrix} = ad - bc$ (two permutations: identity with $+$, transposition with $-$).
For $n = 3$: six permutations with their signs give the Rule of Sarrus:
Visual: 3x3 determinant (Sarrus' rule)
| a b c |
| d e f | = aei + bfg + cdh - ceg - bdi - afh
| g h i |
Positive diagonals: Negative diagonals:
a b c a b a b c a b
d e f d e d e f
g h i g g h i
\ \ \ (+ terms) / / / (- terms)
The Leibniz formula has $n!$ terms, making it $O(n!)$ - completely impractical for large $n$ (even $n = 20$ gives $> 10^{18}$ terms). Cofactor expansion and row reduction are $O(n^3)$.
Cofactor Expansion
Theorem (Laplace Expansion). For any row $i$ or column $j$: $$\det(A) = \sum_{j=1}^n (-1)^{i+j} a_{ij} M_{ij}$$ where $M_{ij}$ is the minor - the determinant of the $(n-1) \times (n-1)$ matrix obtained by deleting row $i$ and column $j$.
The scalar $C_{ij} = (-1)^{i+j} M_{ij}$ is the cofactor. Expanding along a row or column with many zeros is efficient.
2x2: | a b | = ad - bc
| c d |
3x3 expansion along row 1:
| a b c |
| d e f | = a*|e f| - b*|d f| + c*|d e|
| g h i | |h i| |g i| |g h|
Key Properties
Theorem. The determinant satisfies:
(i) Multiplicativity: $\det(AB) = \det(A)\det(B)$.
(ii) Transpose: $\det(A^T) = \det(A)$.
(iii) Inverse: If $A$ is invertible, $\det(A^{-1}) = 1/\det(A)$.
(iv) Triangular matrices: $\det$ of an upper (or lower) triangular matrix is the product of its diagonal entries.
(v) Row operations: Adding a multiple of one row to another leaves $\det$ unchanged. Scaling a row by $\alpha$ scales $\det$ by $\alpha$. Swapping two rows negates $\det$.
Proof of (i) sketch. For fixed $A$, the map $B \mapsto \det(AB)/\det(A)$ satisfies the three axioms of the determinant (multilinearity and alternating in columns of $B$, normalisation from $\det(AI) = \det(A)$). By uniqueness, it equals $\det(B)$. $\square$
Proof of (ii). The Leibniz formula $\det(A) = \sum_\sigma \text{sgn}(\sigma) \prod_i a_{i,\sigma(i)}$ is symmetric under transposing: re-index by $\sigma^{-1}$ and use $\text{sgn}(\sigma) = \text{sgn}(\sigma^{-1})$. $\square$
Property (v) explains Gaussian elimination: row operations change the determinant in controlled ways, and reducing to upper triangular form makes computation trivial.
Determinant and Invertibility
Theorem. $A$ is invertible $\iff$ $\det(A) \neq 0$.
Proof. ($\Rightarrow$) If $A$ is invertible, $\det(A)\det(A^{-1}) = \det(I) = 1$, so $\det(A) \neq 0$.
($\Leftarrow$) If $A$ is not invertible, its columns are linearly dependent. Then one column is a linear combination of the others; by multilinearity and the alternating property, $\det(A) = 0$. (Expand: the dependent column contributes terms, each involving a matrix with a repeated column, hence zero determinant.) $\square$
Equivalently: $\det(A) = 0 \iff$ $A$ has a nontrivial null space $\iff$ $0$ is an eigenvalue of $A$.
Geometric Meaning: Signed Volume
The determinant measures signed volume. Concretely: $|\det(A)|$ equals the $n$-dimensional volume of the parallelepiped with sides $a_1, \ldots, a_n$. The sign encodes orientation - positive if $(a_1, \ldots, a_n)$ is a positively oriented frame (same handedness as the standard basis), negative otherwise.
In R^2: In R^3:
a3
a2 / |
| * / |
| / \ / |
|/ \ a1--*----*
*-----> a1 | | /
| | /
Area = |det[a1|a2]| | | /
*---*/
Volume = |det[a1|a2|a3]|
For a linear transformation $T$ with matrix $A$: $T$ scales $n$-dimensional volumes by $|\det(A)|$. If $|\det(A)| < 1$, volumes shrink; if $|\det(A)| > 1$, volumes expand; if $\det(A) = 0$, $T$ collapses at least one dimension (volume becomes zero).
Characteristic Polynomial and Eigenvalues
The equation $Av = \lambda v$ (with $v \neq 0$) is equivalent to $(A - \lambda I)v = 0$, i.e., $A - \lambda I$ is not invertible. By the determinant-invertibility equivalence: $$\det(A - \lambda I) = 0$$
The function $p(\lambda) = \det(A - \lambda I)$ is a degree-$n$ polynomial in $\lambda$ - the characteristic polynomial of $A$. Its roots are exactly the eigenvalues of $A$.
Expanding via Leibniz, the characteristic polynomial has leading term $(-\lambda)^n$ and constant term $\det(A) = p(0)$. The coefficient of $(-\lambda)^{n-1}$ is $\text{tr}(A) = \sum_i a_{ii}$. Thus: $\det(A) = \prod_i \lambda_i$ and $\text{tr}(A) = \sum_i \lambda_i$ (products and sums of eigenvalues, counted with algebraic multiplicity).
Cramer’s Rule
For an invertible $n \times n$ system $Ax = b$, the solution satisfies: $$x_j = \frac{\det(A_j)}{\det(A)}$$ where $A_j$ is $A$ with column $j$ replaced by $b$.
Why it’s not practical. Each $\det(A_j)$ costs $O(n^3)$ by Gaussian elimination, and there are $n$ of them, giving $O(n^4)$ total - worse than $O(n^3)$ for LU factorisation. The Leibniz formula version is $O(n \cdot n!)$, which is astronomically worse. Cramer’s rule has theoretical value (closed-form expressions for matrix inverses, sensitivity analysis) but is never used computationally for large systems.
Examples
Jacobian determinant in change of variables. When changing variables in a multidimensional integral from $x$ to $u = \phi(x)$: $$\int_{\phi(D)} f(u), du = \int_D f(\phi(x)), |\det J_\phi(x)|, dx$$ where $J_\phi$ is the Jacobian matrix $(\partial \phi_i / \partial x_j)$. The Jacobian determinant is the local volume scaling factor of the nonlinear map $\phi$. This appears in probability (density transformation formula: if $X \sim p_X$ and $Y = g(X)$, then $p_Y(y) = p_X(g^{-1}(y)) |\det J_{g^{-1}}(y)|$), in normalising flows (generative models that learn invertible transformations), and in physics (coordinate transformations).
Orientation. The sign of $\det$ determines orientation. In computer graphics, the sign of the cross-product determinant tells whether a triangle’s vertices are in clockwise or counterclockwise order - this is used in backface culling and winding-order tests.
Condition number. The condition number $\kappa(A) = |A| \cdot |A^{-1}|$ measures how sensitive $Ax = b$ is to perturbations. While $\det(A) = 0$ characterises exact singularity, a small $|\det(A)|$ does not imply ill-conditioning (a scalar multiple $\epsilon I$ has $\det = \epsilon^n$ which can be tiny even as $\kappa = 1$). The condition number, not the determinant, is the right measure of numerical difficulty. This is a common misconception to avoid.
Determinants in ML: avoided in practice. Computing $\log \det \Sigma$ (log-determinant of a covariance matrix) appears in multivariate Gaussian likelihoods and information-theoretic quantities. It is computed as $\sum_i \log L_{ii}$ from the Cholesky factorisation $\Sigma = LL^T$ (where $L$ is lower triangular with positive diagonal), avoiding the $O(n!)$ formula entirely. The scipy.linalg.slogdet function computes the sign and log-absolute-value of the determinant stably, using LU decomposition.
Read Next: