Helpful context:


Eigenvalues and eigenvectors are the central objects of linear algebra. They reveal the “grain” of a linear map: the directions that are merely scaled (not rotated) under the map, and the scale factors. From Google’s PageRank to quantum mechanics to the stability of bridges, eigenstructure governs the behaviour of linear systems.

Definitions

Let $V$ be a vector space over $\mathbb{F}$ and $T \in \mathcal{L}(V)$ a linear operator.

Definition. A scalar $\lambda \in \mathbb{F}$ is an eigenvalue of $T$ if there exists a nonzero vector $v \in V$ such that

$$T(v) = \lambda v.$$

Any such nonzero $v$ is an eigenvector corresponding to $\lambda$. The zero vector is explicitly excluded: it trivially satisfies the equation for any $\lambda$.

In matrix language, for $A \in \mathbb{F}^{n \times n}$: $Av = \lambda v$ with $v \neq 0$.

λ = 2  (stretch) O v T(v) = 2v same direction, twice as long λ = −1  (flip) O w T(w) = −w same length, opposite direction

Geometrically: eigenvectors are the directions left invariant (as lines) by $T$; the eigenvalue is the stretch factor, which can be negative (reversal) or complex (rotation).

Finding Eigenvalues: The Characteristic Polynomial

$\lambda$ is an eigenvalue of $A$ iff $Av = \lambda v$ has a nonzero solution $v$, i.e., $(A - \lambda I)v = 0$ has a nontrivial solution, i.e., $A - \lambda I$ is singular:

$$\det(A - \lambda I) = 0.$$

Definition. The characteristic polynomial of $A \in \mathbb{F}^{n \times n}$ is

$$p_A(\lambda) = \det(A - \lambda I).$$

This is a degree-$n$ polynomial in $\lambda$:

$$p_A(\lambda) = (-1)^n \lambda^n + (-1)^{n-1}(\mathrm{tr} A)\lambda^{n-1} + \cdots + \det(A).$$

The eigenvalues of $A$ are precisely the roots of $p_A$.

Subtlety. Over $\mathbb{R}$, a polynomial of degree $\geq 3$ need not have real roots (e.g., rotation matrices). Over $\mathbb{C}$, the Fundamental Theorem of Algebra guarantees all $n$ roots (counted with multiplicity).

Theorem (Eigenvalues of triangular matrices). If $A$ is upper (or lower) triangular, its eigenvalues are its diagonal entries.

Proof. $\det(A - \lambda I)$ for a triangular matrix is the product of diagonal entries of $A - \lambda I$, which are $a_{11} - \lambda, a_{22} - \lambda, \ldots, a_{nn} - \lambda$. So $p_A(\lambda) = \prod_{i=1}^n (a_{ii} - \lambda)$, and the roots are $\lambda = a_{ii}$. $\square$

Algebraic and Geometric Multiplicity

Definition. For an eigenvalue $\lambda_0$ of $A$:

  • The algebraic multiplicity $m_a(\lambda_0)$ is the multiplicity of $\lambda_0$ as a root of $p_A(\lambda)$.
  • The eigenspace is $E_{\lambda_0} = \ker(A - \lambda_0 I) = \{v : Av = \lambda_0 v\}$.
  • The geometric multiplicity $m_g(\lambda_0) = \dim E_{\lambda_0}$.

Theorem. For every eigenvalue $\lambda_0$:

$$1 \leq m_g(\lambda_0) \leq m_a(\lambda_0).$$

Proof sketch. The lower bound $m_g \geq 1$ holds because $\lambda_0$ is an eigenvalue, so $E_{\lambda_0}$ contains at least one nonzero vector. For the upper bound: choose a basis for $E_{\lambda_0}$ and extend to a basis of $\mathbb{F}^n$; in this basis $A$ has a block structure showing $(\lambda_0 - \lambda)^{m_g(\lambda_0)}$ divides $p_A(\lambda)$, hence $m_g \leq m_a$. $\square$

Example. $A = \begin{pmatrix}2 & 1 \\ 0 & 2\end{pmatrix}$. Then $p_A(\lambda) = (2-\lambda)^2$, so $\lambda_0 = 2$ has $m_a = 2$. But $\ker(A - 2I) = \ker\begin{pmatrix}0&1\\0&0\end{pmatrix} = \mathrm{span}\{e_1\}$, so $m_g = 1 < m_a = 2$. This matrix is not diagonalizable.

The Diagonalization Theorem

Theorem (Diagonalizability). $A \in \mathbb{F}^{n \times n}$ is diagonalizable (i.e., $A = PDP^{-1}$ with $D$ diagonal, $P$ invertible) if and only if $p_A$ splits completely over $\mathbb{F}$ and $m_g(\lambda) = m_a(\lambda)$ for every eigenvalue $\lambda$.

When this holds, the columns of $P$ are $n$ linearly independent eigenvectors of $A$, and the diagonal entries of $D$ are the corresponding eigenvalues.

Key lemma used in proof: Eigenvectors from distinct eigenvalues are linearly independent.

Proof of lemma. Suppose $\sum_{i=1}^k c_i v_i = 0$ with $A v_i = \lambda_i v_i$ and all $\lambda_i$ distinct. Apply $(A - \lambda_k I)$ to get $\sum_{i=1}^{k-1} c_i (\lambda_i - \lambda_k) v_i = 0$. Repeat, peeling off one eigenvalue at a time; since all differences $\lambda_i - \lambda_j \neq 0$, we get $c_i = 0$ for all $i$. $\square$

Corollary. If $A$ has $n$ distinct eigenvalues, it is diagonalizable.

Why Diagonalization Works: The Column Interpretation

The theorem states $A = PDP^{-1}$, or equivalently $AP = PD$. This looks like a formal manipulation, but the column interpretation of matrix multiplication makes it completely transparent.

Set up. Suppose $A$ has $n$ linearly independent eigenvectors $\mathbf{v}_1, \ldots, \mathbf{v}_n$ with eigenvalues $\lambda_1, \ldots, \lambda_n$. Form the eigenvector matrix $P$ by placing these eigenvectors as columns:

$$P = \begin{pmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ | & | & & | \end{pmatrix}$$

and the diagonal matrix of eigenvalues:

$$D = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{pmatrix}$$

Left side: what $AP$ does column by column. By the column interpretation, column $k$ of $AP$ is $A$ applied to column $k$ of $P$:

$$\text{column } k \text{ of } AP = A\mathbf{v}_k = \lambda_k \mathbf{v}_k.$$

$A$ stretches or squishes each eigenvector by its eigenvalue. That is it - nothing more happens to these special directions.

Right side: what $PD$ does column by column. Column $k$ of $D$ is $\lambda_k \mathbf{e}_k$ (zero everywhere except $\lambda_k$ in position $k$). By the column interpretation, column $k$ of $PD$ is:

$$\text{column } k \text{ of } PD = P(\lambda_k \mathbf{e}_k) = \lambda_k (P\mathbf{e}_k) = \lambda_k \mathbf{v}_k.$$

Multiplying $P$ by diagonal $D$ from the right just scales each column of $P$ by the corresponding diagonal entry - column $k$ gets multiplied by $\lambda_k$.

The punchline. Both sides give the same thing: column $k$ is $\lambda_k \mathbf{v}_k$. So:

$$AP = PD$$

column by column, with no algebra required. Since the eigenvectors are linearly independent, $P$ is invertible. Multiply both sides on the right by $P^{-1}$:

$$A = PDP^{-1}.$$

The geometric picture. What does $A = PDP^{-1}$ say about what $A$ does to a vector $\mathbf{x}$? Reading right to left:

  • $P^{-1}$: change coordinates into the eigenvector basis (express $\mathbf{x}$ as a combination of eigenvectors)
  • $D$: in the eigenvector basis, $A$ acts as pure scaling - each eigenvector direction gets multiplied by its eigenvalue, independently of all others
  • $P$: change coordinates back to the standard basis

In the eigenvector basis, $A$ is diagonal. Diagonalization is just a change of vantage point to the coordinates where $A$ does the simplest possible thing.

Why powers are easy. $A^k = (PDP^{-1})^k = PD^kP^{-1}$, because the $P^{-1}P$ terms in the middle cancel:

$$(PDP^{-1})(PDP^{-1}) = PD(P^{-1}P)DP^{-1} = PD^2P^{-1}.$$

And $D^k$ is trivial - just raise each diagonal entry to the $k$-th power:

$$D^k = \begin{pmatrix} \lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k \end{pmatrix}.$$

So computing $A^{100}$ for a diagonalizable matrix costs no more than computing $A$ once: find $P$ and $D$, raise diagonal entries to the 100th power, done.

The Spectral Theorem for Symmetric Matrices

Theorem (Real Spectral Theorem). If $A \in \mathbb{R}^{n \times n}$ is symmetric ($A = A^T$), then:

  1. All eigenvalues of $A$ are real.
  2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  3. $A$ is orthogonally diagonalizable: $A = Q\Lambda Q^T$ where $Q$ is orthogonal ($Q^T Q = I$) and $\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)$.

Proof of (1). Let $\lambda \in \mathbb{C}$ be an eigenvalue and $v \in \mathbb{C}^n$ the eigenvector. Then $Av = \lambda v$, so $\bar{v}^T A v = \lambda \bar{v}^T v$. Since $A$ is real symmetric, $\bar{v}^T A v = (Av)^T \bar{v} = \overline{(Av^T)\bar{v}} = \overline{\bar{v}^T A v}$ - that is, $\bar{v}^T A v$ is real. Since $\bar{v}^T v = |v|^2 > 0$, we get $\lambda = (\bar{v}^T A v)/|v|^2 \in \mathbb{R}$. $\square$

Proof of (2). Let $Av = \lambda v$ and $Aw = \mu w$ with $\lambda \neq \mu$. Then

$$\lambda (v \cdot w) = (\lambda v) \cdot w = (Av) \cdot w = v \cdot (A^T w) = v \cdot (Aw) = \mu (v \cdot w).$$

Since $\lambda \neq \mu$, we must have $v \cdot w = 0$. $\square$

Proof of (3). By induction on $n$.

Base case $n = 1$: a $1 \times 1$ symmetric matrix is already diagonal.

Inductive step: assume every $(n-1) \times (n-1)$ symmetric matrix is orthogonally diagonalizable. Let $A \in \mathbb{R}^{n \times n}$ be symmetric.

Step 1. By (1), $A$ has at least one real eigenvalue $\lambda_1$. Pick a unit eigenvector $v_1$ with $Av_1 = \lambda_1 v_1$.

Step 2. Let $W = {v_1}^\perp$, the $(n-1)$-dimensional orthogonal complement of $v_1$. The subspace $W$ is $A$-invariant: for any $w \in W$, $$\langle Aw, v_1 \rangle = \langle w, A^T v_1 \rangle = \langle w, Av_1 \rangle = \lambda_1 \langle w, v_1 \rangle = 0,$$ so $Aw \in W$. (The key step uses $A^T = A$.)

Step 3. The restriction $A|_W : W \to W$ is a symmetric operator on an $(n-1)$-dimensional space. By the inductive hypothesis, $W$ has an orthonormal basis $v_2, \ldots, v_n$ of eigenvectors of $A|_W$, and hence of $A$.

Step 4. Since $v_1 \perp W$ and $|v_1| = 1$, the set ${v_1, v_2, \ldots, v_n}$ is an orthonormal basis of $\mathbb{R}^n$ consisting entirely of eigenvectors of $A$. $\square$

The argument hinges on Step 2: because $A$ is symmetric, it maps the orthogonal complement of any eigenvector back into itself, so you can always “peel off” one eigenvector and apply induction to the remaining subspace.

The complex analogue uses Hermitian matrices ($A = A^\ast$); the conclusion is that $A = U\Lambda U^\ast$ with $U$ unitary.

The Gershgorin Circle Theorem

When you cannot compute eigenvalues exactly, you can locate them.

Theorem (Gershgorin, 1931). Every eigenvalue of $A \in \mathbb{C}^{n \times n}$ lies in at least one of the Gershgorin discs:

$$D_i = \left\{\lambda \in \mathbb{C} : |\lambda - a_{ii}| \leq R_i\right\}, \quad R_i = \sum_{j \neq i} |a_{ij}|.$$

In words: each disc is centred at a diagonal entry $a_{ii}$ with radius equal to the sum of absolute values of all off-diagonal entries in that row.

Proof. Let $\lambda$ be an eigenvalue with eigenvector $v = (v_1, \ldots, v_n)^T \neq 0$. Let $k = \arg\max_i |v_i|$. From the $k$-th component of $Av = \lambda v$:

$$\lambda v_k = \sum_j a_{kj} v_j = a_{kk} v_k + \sum_{j \neq k} a_{kj} v_j.$$

Rearranging: $(\lambda - a_{kk}) v_k = \sum_{j \neq k} a_{kj} v_j$. Taking absolute values:

$$|\lambda - a_{kk}| |v_k| \leq \sum_{j \neq k} |a_{kj}| |v_j| \leq R_k |v_k|.$$

Dividing by $|v_k| > 0$: $|\lambda - a_{kk}| \leq R_k$, so $\lambda \in D_k$. $\square$

Gershgorin discs give cheap, useful bounds. Diagonally dominant matrices (where $|a_{ii}| > R_i$ for all $i$) are nonsingular, and their eigenvalues are localized near the diagonal.

The Cayley - Hamilton Theorem

Theorem (Cayley - Hamilton). Every square matrix satisfies its own characteristic polynomial: if $p_A(\lambda) = \det(A - \lambda I)$, then

$$p_A(A) = 0.$$

Example. $A = \begin{pmatrix}1&2\\3&4\end{pmatrix}$, $p_A(\lambda) = \lambda^2 - 5\lambda - 2$. Then $A^2 - 5A - 2I = 0$.

Proof sketch (via adjugate). Write $p_A(\lambda)I = (A - \lambda I) \cdot \mathrm{adj}(A - \lambda I)$ (since $M \cdot \mathrm{adj}(M) = \det(M)I$). The adjugate $\mathrm{adj}(A - \lambda I)$ has polynomial entries in $\lambda$; collect by powers of $\lambda$: $\mathrm{adj}(A - \lambda I) = \sum_{k=0}^{n-1} (-\lambda)^k B_k$. Substituting $\lambda = A$ and using the polynomial identity yields $p_A(A) = 0$. $\square$

Consequence. $A^n$ can be expressed as a linear combination of $I, A, A^2, \ldots, A^{n-1}$. The minimal polynomial $m_A(\lambda)$ (the monic polynomial of least degree annihilating $A$) divides $p_A(\lambda)$.

Power Iteration

Power iteration is the simplest algorithm for finding the dominant eigenvalue.

Algorithm. Given $A$, start with random $v_0$:

v_0 = random unit vector
for k = 0, 1, 2, ...:
    w_{k+1} = A * v_k
    v_{k+1} = w_{k+1} / ||w_{k+1}||
    lambda_k = v_k^T * A * v_k     (Rayleigh quotient)

Convergence. Expand $v_0 = \sum_i c_i u_i$ in the eigenbasis. Then

$$A^k v_0 = \sum_i c_i \lambda_i^k u_i = \lambda_1^k \left(c_1 u_1 + \sum_{i \geq 2} c_i \left(\frac{\lambda_i}{\lambda_1}\right)^k u_i\right).$$

If $|\lambda_1| > |\lambda_2| \geq \cdots$, the ratio $(\lambda_i/\lambda_1)^k \to 0$ geometrically at rate $|\lambda_2/\lambda_1|$, so $v_k \to u_1$ (the dominant eigenvector). Convergence is slow when $|\lambda_2/\lambda_1|$ is close to 1 (eigenvalue gap is small).

Google PageRank is essentially power iteration on the web graph transition matrix (a column-stochastic matrix). The dominant eigenvector (eigenvalue 1) gives the stationary distribution = page importance.

Jordan Blocks and Jordan Normal Form

When $m_g(\lambda) < m_a(\lambda)$, diagonalization fails. The Jordan form provides the canonical substitute.

A Jordan block $J_m(\lambda)$ is an $m \times m$ matrix:

$$J_m(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots \\ 0 & \lambda & 1 & \cdots \\ \vdots & & \ddots & 1 \\ 0 & 0 & \cdots & \lambda \end{pmatrix}.$$

Theorem (Jordan Normal Form). Over $\mathbb{C}$ (or any algebraically closed field), every $A \in \mathbb{F}^{n \times n}$ is similar to a block diagonal matrix

$$J = \mathrm{diag}(J_{m_1}(\lambda_1), \ldots, J_{m_r}(\lambda_r)),$$

unique up to reordering of blocks. The $\lambda_i$ are eigenvalues (with repetition allowed across blocks for the same eigenvalue).

The Jordan form of a $3 \times 3$ nilpotent matrix (all eigenvalues zero) might look like:

$$\begin{pmatrix}0&1&0\\0&0&1\\0&0&0\end{pmatrix} \quad \text{or} \quad \begin{pmatrix}0&1&0\\0&0&0\\0&0&0\end{pmatrix}.$$

The first has one block of size 3; the second has one block of size 2 and one of size 1.

Examples

PageRank. The web is a directed graph. Build the transition matrix $M$ (column $j$ gives outlink distribution from page $j$). PageRank is the dominant eigenvector of $M$ (eigenvalue 1 of a stochastic matrix always exists by Perron - Frobenius). Power iteration converges in $O(1/\text{spectral gap})$ steps.

PCA and covariance eigenvectors. Given a data matrix $X \in \mathbb{R}^{n \times p}$ (centred), the covariance matrix is $C = X^T X / (n-1)$. PCA finds the orthonormal eigenvectors $q_1, \ldots, q_p$ of $C$ (eigenvalues $\lambda_1 \geq \cdots \geq \lambda_p \geq 0$). The projection onto the top $k$ eigenvectors gives the best rank-$k$ approximation of the data in a mean-squared sense.

Stability of dynamical systems. For the linear system $\dot{x} = Ax$, the solution is $x(t) = e^{At}x_0$. The system is stable (solutions decay) iff all eigenvalues of $A$ have negative real part. For the discrete system $x_{k+1} = Ax_k$, stability requires all eigenvalues inside the unit disc ($|\lambda_i| < 1$).

Numerical practice. Computing eigenvalues directly from $\det(A - \lambda I) = 0$ is numerically terrible (roots of high-degree polynomials are ill-conditioned). Production libraries (LAPACK’s dgeev, NumPy’s linalg.eig) use the QR algorithm: iteratively reduce $A$ to upper triangular (Schur) form via orthogonal similarity transformations, revealing eigenvalues on the diagonal. This is $O(n^3)$ and numerically stable.

Summary table:

Property Diagonalizable $A$ General $A$
Canonical form $PDP^{-1}$ $PJP^{-1}$ (Jordan)
Condition $m_g = m_a$ all $\lambda$ Always (over $\mathbb{C}$)
$A^k$ formula $PD^kP^{-1}$ More complex (involves $k^j$ terms)
Symmetric $A$ $Q\Lambda Q^T$ orthogonal N/A (symmetric $\Rightarrow$ diagonalizable)

Read Next: