Prerequisite:


Eigenvalues and eigenvectors are the central objects of linear algebra. They reveal the “grain” of a linear map: the directions that are merely scaled (not rotated) under the map, and the scale factors. From Google’s PageRank to quantum mechanics to the stability of bridges, eigenstructure governs the behaviour of linear systems.

Definitions

Let $V$ be a vector space over $\mathbb{F}$ and $T \in \mathcal{L}(V)$ a linear operator.

Definition. A scalar $\lambda \in \mathbb{F}$ is an eigenvalue of $T$ if there exists a nonzero vector $v \in V$ such that

$$T(v) = \lambda v.$$

Any such nonzero $v$ is an eigenvector corresponding to $\lambda$. The zero vector is explicitly excluded: it trivially satisfies the equation for any $\lambda$.

In matrix language, for $A \in \mathbb{F}^{n \times n}$: $Av = \lambda v$ with $v \neq 0$.

Eigenvector picture:

  T(v) = 2v             T(w) = -w
  
     T(v)               w   T(w)
      ^                 ^   <---
      |                 |
      |                 |
      v                 
      
  v stretched by 2   w flipped (lambda = -1)

Geometrically: eigenvectors are the directions left invariant (as lines) by $T$; the eigenvalue is the stretch factor, which can be negative (reversal) or complex (rotation).

Finding Eigenvalues: The Characteristic Polynomial

$\lambda$ is an eigenvalue of $A$ iff $Av = \lambda v$ has a nonzero solution $v$, i.e., $(A - \lambda I)v = 0$ has a nontrivial solution, i.e., $A - \lambda I$ is singular:

$$\det(A - \lambda I) = 0.$$

Definition. The characteristic polynomial of $A \in \mathbb{F}^{n \times n}$ is

$$p_A(\lambda) = \det(A - \lambda I).$$

This is a degree-$n$ polynomial in $\lambda$:

$$p_A(\lambda) = (-1)^n \lambda^n + (-1)^{n-1}(\mathrm{tr}, A)\lambda^{n-1} + \cdots + \det(A).$$

The eigenvalues of $A$ are precisely the roots of $p_A$.

Subtlety. Over $\mathbb{R}$, a polynomial of degree $\geq 3$ need not have real roots (e.g., rotation matrices). Over $\mathbb{C}$, the Fundamental Theorem of Algebra guarantees all $n$ roots (counted with multiplicity).

Theorem (Eigenvalues of triangular matrices). If $A$ is upper (or lower) triangular, its eigenvalues are its diagonal entries.

Proof. $\det(A - \lambda I)$ for a triangular matrix is the product of diagonal entries of $A - \lambda I$, which are $a_{11} - \lambda, a_{22} - \lambda, \ldots, a_{nn} - \lambda$. So $p_A(\lambda) = \prod_{i=1}^n (a_{ii} - \lambda)$, and the roots are $\lambda = a_{ii}$. $\square$

Algebraic and Geometric Multiplicity

Definition. For an eigenvalue $\lambda_0$ of $A$:

  • The algebraic multiplicity $m_a(\lambda_0)$ is the multiplicity of $\lambda_0$ as a root of $p_A(\lambda)$.
  • The eigenspace is $E_{\lambda_0} = \ker(A - \lambda_0 I) = {v : Av = \lambda_0 v}$.
  • The geometric multiplicity $m_g(\lambda_0) = \dim E_{\lambda_0}$.

Theorem. For every eigenvalue $\lambda_0$:

$$1 \leq m_g(\lambda_0) \leq m_a(\lambda_0).$$

Proof sketch. The lower bound $m_g \geq 1$ holds because $\lambda_0$ is an eigenvalue, so $E_{\lambda_0}$ contains at least one nonzero vector. For the upper bound: choose a basis for $E_{\lambda_0}$ and extend to a basis of $\mathbb{F}^n$; in this basis $A$ has a block structure showing $(\lambda_0 - \lambda)^{m_g(\lambda_0)}$ divides $p_A(\lambda)$, hence $m_g \leq m_a$. $\square$

Example. $A = \begin{pmatrix}2 & 1 \ 0 & 2\end{pmatrix}$. Then $p_A(\lambda) = (2-\lambda)^2$, so $\lambda_0 = 2$ has $m_a = 2$. But $\ker(A - 2I) = \ker\begin{pmatrix}0&1\0&0\end{pmatrix} = \mathrm{span}{e_1}$, so $m_g = 1 < m_a = 2$. This matrix is not diagonalizable.

The Diagonalization Theorem

Theorem (Diagonalizability). $A \in \mathbb{F}^{n \times n}$ is diagonalizable (i.e., $A = PDP^{-1}$ with $D$ diagonal, $P$ invertible) if and only if $p_A$ splits completely over $\mathbb{F}$ and $m_g(\lambda) = m_a(\lambda)$ for every eigenvalue $\lambda$.

When this holds, the columns of $P$ are $n$ linearly independent eigenvectors of $A$, and the diagonal entries of $D$ are the corresponding eigenvalues.

Key lemma used in proof: Eigenvectors from distinct eigenvalues are linearly independent.

Proof of lemma. Suppose $\sum_{i=1}^k c_i v_i = 0$ with $A v_i = \lambda_i v_i$ and all $\lambda_i$ distinct. Apply $(A - \lambda_k I)$ to get $\sum_{i=1}^{k-1} c_i (\lambda_i - \lambda_k) v_i = 0$. Repeat, peeling off one eigenvalue at a time; since all differences $\lambda_i - \lambda_j \neq 0$, we get $c_i = 0$ for all $i$. $\square$

Corollary. If $A$ has $n$ distinct eigenvalues, it is diagonalizable.

The Spectral Theorem for Symmetric Matrices

Theorem (Real Spectral Theorem). If $A \in \mathbb{R}^{n \times n}$ is symmetric ($A = A^T$), then:

  1. All eigenvalues of $A$ are real.
  2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  3. $A$ is orthogonally diagonalizable: $A = Q\Lambda Q^T$ where $Q$ is orthogonal ($Q^T Q = I$) and $\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)$.

Proof of (1). Let $\lambda \in \mathbb{C}$ be an eigenvalue and $v \in \mathbb{C}^n$ the eigenvector. Then $Av = \lambda v$, so $\bar{v}^T A v = \lambda \bar{v}^T v$. Since $A$ is real symmetric, $\bar{v}^T A v = (Av)^T \bar{v} = \overline{(Av^T)\bar{v}} = \overline{\bar{v}^T A v}$ - that is, $\bar{v}^T A v$ is real. Since $\bar{v}^T v = |v|^2 > 0$, we get $\lambda = (\bar{v}^T A v)/|v|^2 \in \mathbb{R}$. $\square$

Proof of (2). Let $Av = \lambda v$ and $Aw = \mu w$ with $\lambda \neq \mu$. Then

$$\lambda (v \cdot w) = (\lambda v) \cdot w = (Av) \cdot w = v \cdot (A^T w) = v \cdot (Aw) = \mu (v \cdot w).$$

Since $\lambda \neq \mu$, we must have $v \cdot w = 0$. $\square$

Proof of (3). By induction on $n$ using the fact that the eigenspace of any eigenvalue is $A$-invariant, and its orthogonal complement is also $A$-invariant (because $A$ is symmetric). $\square$

The complex analogue uses Hermitian matrices ($A = A^\ast$); the conclusion is that $A = U\Lambda U^\ast$ with $U$ unitary.

The Gershgorin Circle Theorem

When you cannot compute eigenvalues exactly, you can locate them.

Theorem (Gershgorin, 1931). Every eigenvalue of $A \in \mathbb{C}^{n \times n}$ lies in at least one of the Gershgorin discs:

$$D_i = \left{\lambda \in \mathbb{C} : |\lambda - a_{ii}| \leq R_i\right}, \quad R_i = \sum_{j \neq i} |a_{ij}|.$$

In words: each disc is centred at a diagonal entry $a_{ii}$ with radius equal to the sum of absolute values of all off-diagonal entries in that row.

Proof. Let $\lambda$ be an eigenvalue with eigenvector $v = (v_1, \ldots, v_n)^T \neq 0$. Let $k = \arg\max_i |v_i|$. From the $k$-th component of $Av = \lambda v$:

$$\lambda v_k = \sum_j a_{kj} v_j = a_{kk} v_k + \sum_{j \neq k} a_{kj} v_j.$$

Rearranging: $(\lambda - a_{kk}) v_k = \sum_{j \neq k} a_{kj} v_j$. Taking absolute values:

$$|\lambda - a_{kk}| |v_k| \leq \sum_{j \neq k} |a_{kj}| |v_j| \leq R_k |v_k|.$$

Dividing by $|v_k| > 0$: $|\lambda - a_{kk}| \leq R_k$, so $\lambda \in D_k$. $\square$

Gershgorin discs give cheap, useful bounds. Diagonally dominant matrices (where $|a_{ii}| > R_i$ for all $i$) are nonsingular, and their eigenvalues are localized near the diagonal.

The Cayley–Hamilton Theorem

Theorem (Cayley–Hamilton). Every square matrix satisfies its own characteristic polynomial: if $p_A(\lambda) = \det(A - \lambda I)$, then

$$p_A(A) = 0.$$

Example. $A = \begin{pmatrix}1&2\3&4\end{pmatrix}$, $p_A(\lambda) = \lambda^2 - 5\lambda - 2$. Then $A^2 - 5A - 2I = 0$.

Proof sketch (via adjugate). Write $p_A(\lambda)I = (A - \lambda I) \cdot \mathrm{adj}(A - \lambda I)$ (since $M \cdot \mathrm{adj}(M) = \det(M)I$). The adjugate $\mathrm{adj}(A - \lambda I)$ has polynomial entries in $\lambda$; collect by powers of $\lambda$: $\mathrm{adj}(A - \lambda I) = \sum_{k=0}^{n-1} (-\lambda)^k B_k$. Substituting $\lambda = A$ and using the polynomial identity yields $p_A(A) = 0$. $\square$

Consequence. $A^n$ can be expressed as a linear combination of $I, A, A^2, \ldots, A^{n-1}$. The minimal polynomial $m_A(\lambda)$ (the monic polynomial of least degree annihilating $A$) divides $p_A(\lambda)$.

Power Iteration

Power iteration is the simplest algorithm for finding the dominant eigenvalue.

Algorithm. Given $A$, start with random $v_0$:

v_0 = random unit vector
for k = 0, 1, 2, ...:
    w_{k+1} = A * v_k
    v_{k+1} = w_{k+1} / ||w_{k+1}||
    lambda_k = v_k^T * A * v_k     (Rayleigh quotient)

Convergence. Expand $v_0 = \sum_i c_i u_i$ in the eigenbasis. Then

$$A^k v_0 = \sum_i c_i \lambda_i^k u_i = \lambda_1^k \left(c_1 u_1 + \sum_{i \geq 2} c_i \left(\frac{\lambda_i}{\lambda_1}\right)^k u_i\right).$$

If $|\lambda_1| > |\lambda_2| \geq \cdots$, the ratio $(\lambda_i/\lambda_1)^k \to 0$ geometrically at rate $|\lambda_2/\lambda_1|$, so $v_k \to u_1$ (the dominant eigenvector). Convergence is slow when $|\lambda_2/\lambda_1|$ is close to 1 (eigenvalue gap is small).

Google PageRank is essentially power iteration on the web graph transition matrix (a column-stochastic matrix). The dominant eigenvector (eigenvalue 1) gives the stationary distribution = page importance.

Jordan Blocks and Jordan Normal Form

When $m_g(\lambda) < m_a(\lambda)$, diagonalization fails. The Jordan form provides the canonical substitute.

A Jordan block $J_m(\lambda)$ is an $m \times m$ matrix:

$$J_m(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots \ 0 & \lambda & 1 & \cdots \ \vdots & & \ddots & 1 \ 0 & 0 & \cdots & \lambda \end{pmatrix}.$$

Theorem (Jordan Normal Form). Over $\mathbb{C}$ (or any algebraically closed field), every $A \in \mathbb{F}^{n \times n}$ is similar to a block diagonal matrix

$$J = \mathrm{diag}(J_{m_1}(\lambda_1), \ldots, J_{m_r}(\lambda_r)),$$

unique up to reordering of blocks. The $\lambda_i$ are eigenvalues (with repetition allowed across blocks for the same eigenvalue).

The Jordan form of a $3 \times 3$ nilpotent matrix (all eigenvalues zero) might look like:

$$\begin{pmatrix}0&1&0\0&0&1\0&0&0\end{pmatrix} \quad \text{or} \quad \begin{pmatrix}0&1&0\0&0&0\0&0&0\end{pmatrix}.$$

The first has one block of size 3; the second has one block of size 2 and one of size 1.

Examples

PageRank. The web is a directed graph. Build the transition matrix $M$ (column $j$ gives outlink distribution from page $j$). PageRank is the dominant eigenvector of $M$ (eigenvalue 1 of a stochastic matrix always exists by Perron–Frobenius). Power iteration converges in $O(1/\text{spectral gap})$ steps.

PCA and covariance eigenvectors. Given a data matrix $X \in \mathbb{R}^{n \times p}$ (centred), the covariance matrix is $C = X^T X / (n-1)$. PCA finds the orthonormal eigenvectors $q_1, \ldots, q_p$ of $C$ (eigenvalues $\lambda_1 \geq \cdots \geq \lambda_p \geq 0$). The projection onto the top $k$ eigenvectors gives the best rank-$k$ approximation of the data in a mean-squared sense.

Stability of dynamical systems. For the linear system $\dot{x} = Ax$, the solution is $x(t) = e^{At}x_0$. The system is stable (solutions decay) iff all eigenvalues of $A$ have negative real part. For the discrete system $x_{k+1} = Ax_k$, stability requires all eigenvalues inside the unit disc ($|\lambda_i| < 1$).

Numerical practice. Computing eigenvalues directly from $\det(A - \lambda I) = 0$ is numerically terrible (roots of high-degree polynomials are ill-conditioned). Production libraries (LAPACK’s dgeev, NumPy’s linalg.eig) use the QR algorithm: iteratively reduce $A$ to upper triangular (Schur) form via orthogonal similarity transformations, revealing eigenvalues on the diagonal. This is $O(n^3)$ and numerically stable.

Summary table:

Property Diagonalizable $A$ General $A$
Canonical form $PDP^{-1}$ $PJP^{-1}$ (Jordan)
Condition $m_g = m_a$ all $\lambda$ Always (over $\mathbb{C}$)
$A^k$ formula $PD^kP^{-1}$ More complex (involves $k^j$ terms)
Symmetric $A$ $Q\Lambda Q^T$ orthogonal N/A (symmetric $\Rightarrow$ diagonalizable)

Read Next: