Prerequisite:


Axiomatic Definition

Rather than defining the determinant by a formula, the cleanest approach (following Axler and most modern algebra texts) is axiomatic. The determinant $\det : M_{n \times n}(\mathbb{F}) \to \mathbb{F}$ is the unique function of the columns $a_1, \ldots, a_n \in \mathbb{F}^n$ satisfying:

  1. Multilinearity: $\det$ is linear in each column separately, holding the others fixed: $$\det(\ldots, \alpha a + b, \ldots) = \alpha\det(\ldots, a, \ldots) + \det(\ldots, b, \ldots)$$

  2. Alternating: Swapping any two columns changes the sign: $$\det(\ldots, a_i, \ldots, a_j, \ldots) = -\det(\ldots, a_j, \ldots, a_i, \ldots)$$

  3. Normalisation: $\det(I) = 1$.

These three axioms uniquely determine $\det$. The alternating property implies that $\det = 0$ whenever two columns are equal (swap them: the sign changes, but the matrix is the same, so $\det = -\det$, giving $\det = 0$). More generally, the alternating property forces $\det = 0$ whenever the columns are linearly dependent.

The Permutation (Leibniz) Formula

From the axioms, one can derive an explicit formula. Expanding via multilinearity in each column using the standard basis ${e_1, \ldots, e_n}$: $$\det(A) = \det\left(\sum_{i_1} a_{i_1 1} e_{i_1}, \ldots, \sum_{i_n} a_{i_n n} e_{i_n}\right) = \sum_{i_1, \ldots, i_n} a_{i_1 1} \cdots a_{i_n n} \det(e_{i_1}, \ldots, e_{i_n})$$

The term $\det(e_{i_1}, \ldots, e_{i_n})$ is nonzero only when $(i_1, \ldots, i_n)$ is a permutation $\sigma \in S_n$, in which case it equals $\text{sgn}(\sigma)$ (the sign of the permutation, $\pm 1$). This gives the Leibniz formula:

$$\det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^n a_{i,\sigma(i)}$$

For $n = 2$: $\det\begin{pmatrix} a & b \ c & d \end{pmatrix} = ad - bc$ (two permutations: identity with $+$, transposition with $-$).

For $n = 3$: six permutations with their signs give the Rule of Sarrus:

Visual: 3x3 determinant (Sarrus' rule)

| a  b  c |
| d  e  f |  = aei + bfg + cdh - ceg - bdi - afh
| g  h  i |

Positive diagonals:    Negative diagonals:
a  b  c  a  b        a  b  c  a  b
   d  e  f  d  e           d  e  f
      g  h  i  g              g  h  i

\ \ \  (+ terms)     / / /  (- terms)

The Leibniz formula has $n!$ terms, making it $O(n!)$ - completely impractical for large $n$ (even $n = 20$ gives $> 10^{18}$ terms). Cofactor expansion and row reduction are $O(n^3)$.

Cofactor Expansion

Theorem (Laplace Expansion). For any row $i$ or column $j$: $$\det(A) = \sum_{j=1}^n (-1)^{i+j} a_{ij} M_{ij}$$ where $M_{ij}$ is the minor - the determinant of the $(n-1) \times (n-1)$ matrix obtained by deleting row $i$ and column $j$.

The scalar $C_{ij} = (-1)^{i+j} M_{ij}$ is the cofactor. Expanding along a row or column with many zeros is efficient.

2x2:    | a  b |  = ad - bc
        | c  d |

3x3 expansion along row 1:
| a  b  c |
| d  e  f | = a*|e f| - b*|d f| + c*|d e|
| g  h  i |     |h i|    |g i|     |g h|

Key Properties

Theorem. The determinant satisfies:

(i) Multiplicativity: $\det(AB) = \det(A)\det(B)$.

(ii) Transpose: $\det(A^T) = \det(A)$.

(iii) Inverse: If $A$ is invertible, $\det(A^{-1}) = 1/\det(A)$.

(iv) Triangular matrices: $\det$ of an upper (or lower) triangular matrix is the product of its diagonal entries.

(v) Row operations: Adding a multiple of one row to another leaves $\det$ unchanged. Scaling a row by $\alpha$ scales $\det$ by $\alpha$. Swapping two rows negates $\det$.

Proof of (i) sketch. For fixed $A$, the map $B \mapsto \det(AB)/\det(A)$ satisfies the three axioms of the determinant (multilinearity and alternating in columns of $B$, normalisation from $\det(AI) = \det(A)$). By uniqueness, it equals $\det(B)$. $\square$

Proof of (ii). The Leibniz formula $\det(A) = \sum_\sigma \text{sgn}(\sigma) \prod_i a_{i,\sigma(i)}$ is symmetric under transposing: re-index by $\sigma^{-1}$ and use $\text{sgn}(\sigma) = \text{sgn}(\sigma^{-1})$. $\square$

Property (v) explains Gaussian elimination: row operations change the determinant in controlled ways, and reducing to upper triangular form makes computation trivial.

Determinant and Invertibility

Theorem. $A$ is invertible $\iff$ $\det(A) \neq 0$.

Proof. ($\Rightarrow$) If $A$ is invertible, $\det(A)\det(A^{-1}) = \det(I) = 1$, so $\det(A) \neq 0$.

($\Leftarrow$) If $A$ is not invertible, its columns are linearly dependent. Then one column is a linear combination of the others; by multilinearity and the alternating property, $\det(A) = 0$. (Expand: the dependent column contributes terms, each involving a matrix with a repeated column, hence zero determinant.) $\square$

Equivalently: $\det(A) = 0 \iff$ $A$ has a nontrivial null space $\iff$ $0$ is an eigenvalue of $A$.

Geometric Meaning: Signed Volume

The determinant measures signed volume. Concretely: $|\det(A)|$ equals the $n$-dimensional volume of the parallelepiped with sides $a_1, \ldots, a_n$. The sign encodes orientation - positive if $(a_1, \ldots, a_n)$ is a positively oriented frame (same handedness as the standard basis), negative otherwise.

In R^2:                  In R^3:
                                    a3
  a2                              / |
  |  *                           /  |
  | / \                         /   |
  |/   \                   a1--*----*
  *-----> a1               |   |   /
                            |   |  /
  Area = |det[a1|a2]|       |   | /
                            *---*/
                         Volume = |det[a1|a2|a3]|

For a linear transformation $T$ with matrix $A$: $T$ scales $n$-dimensional volumes by $|\det(A)|$. If $|\det(A)| < 1$, volumes shrink; if $|\det(A)| > 1$, volumes expand; if $\det(A) = 0$, $T$ collapses at least one dimension (volume becomes zero).

Characteristic Polynomial and Eigenvalues

The equation $Av = \lambda v$ (with $v \neq 0$) is equivalent to $(A - \lambda I)v = 0$, i.e., $A - \lambda I$ is not invertible. By the determinant-invertibility equivalence: $$\det(A - \lambda I) = 0$$

The function $p(\lambda) = \det(A - \lambda I)$ is a degree-$n$ polynomial in $\lambda$ - the characteristic polynomial of $A$. Its roots are exactly the eigenvalues of $A$.

Expanding via Leibniz, the characteristic polynomial has leading term $(-\lambda)^n$ and constant term $\det(A) = p(0)$. The coefficient of $(-\lambda)^{n-1}$ is $\text{tr}(A) = \sum_i a_{ii}$. Thus: $\det(A) = \prod_i \lambda_i$ and $\text{tr}(A) = \sum_i \lambda_i$ (products and sums of eigenvalues, counted with algebraic multiplicity).

Cramer’s Rule

For an invertible $n \times n$ system $Ax = b$, the solution satisfies: $$x_j = \frac{\det(A_j)}{\det(A)}$$ where $A_j$ is $A$ with column $j$ replaced by $b$.

Why it’s not practical. Each $\det(A_j)$ costs $O(n^3)$ by Gaussian elimination, and there are $n$ of them, giving $O(n^4)$ total - worse than $O(n^3)$ for LU factorisation. The Leibniz formula version is $O(n \cdot n!)$, which is astronomically worse. Cramer’s rule has theoretical value (closed-form expressions for matrix inverses, sensitivity analysis) but is never used computationally for large systems.

Examples

Jacobian determinant in change of variables. When changing variables in a multidimensional integral from $x$ to $u = \phi(x)$: $$\int_{\phi(D)} f(u), du = \int_D f(\phi(x)), |\det J_\phi(x)|, dx$$ where $J_\phi$ is the Jacobian matrix $(\partial \phi_i / \partial x_j)$. The Jacobian determinant is the local volume scaling factor of the nonlinear map $\phi$. This appears in probability (density transformation formula: if $X \sim p_X$ and $Y = g(X)$, then $p_Y(y) = p_X(g^{-1}(y)) |\det J_{g^{-1}}(y)|$), in normalising flows (generative models that learn invertible transformations), and in physics (coordinate transformations).

Orientation. The sign of $\det$ determines orientation. In computer graphics, the sign of the cross-product determinant tells whether a triangle’s vertices are in clockwise or counterclockwise order - this is used in backface culling and winding-order tests.

Condition number. The condition number $\kappa(A) = |A| \cdot |A^{-1}|$ measures how sensitive $Ax = b$ is to perturbations. While $\det(A) = 0$ characterises exact singularity, a small $|\det(A)|$ does not imply ill-conditioning (a scalar multiple $\epsilon I$ has $\det = \epsilon^n$ which can be tiny even as $\kappa = 1$). The condition number, not the determinant, is the right measure of numerical difficulty. This is a common misconception to avoid.

Determinants in ML: avoided in practice. Computing $\log \det \Sigma$ (log-determinant of a covariance matrix) appears in multivariate Gaussian likelihoods and information-theoretic quantities. It is computed as $\sum_i \log L_{ii}$ from the Cholesky factorisation $\Sigma = LL^T$ (where $L$ is lower triangular with positive diagonal), avoiding the $O(n!)$ formula entirely. The scipy.linalg.slogdet function computes the sign and log-absolute-value of the determinant stably, using LU decomposition.


Read Next: