Prerequisite:


A linear map is an intrinsic object - it exists independently of any coordinate system. A matrix, by contrast, is a representation of that map in a chosen basis. Change of basis is the machinery that translates between representations: it tells you how the matrix of a linear map transforms when you swap one coordinate system for another.

Coordinates Relative to a Basis

Let $V$ be an $n$-dimensional vector space over $\mathbb{F}$ (where $\mathbb{F} = \mathbb{R}$ or $\mathbb{C}$). An ordered basis $\mathcal{B} = (b_1, b_2, \ldots, b_n)$ is an ordered list of linearly independent vectors that span $V$.

Every $v \in V$ has a unique expansion

$$v = c_1 b_1 + c_2 b_2 + \cdots + c_n b_n.$$

The scalars $(c_1, c_2, \ldots, c_n)$ are the coordinates of $v$ relative to $\mathcal{B}$, written

$$[v]_{\mathcal{B}} = \begin{pmatrix} c_1 \ c_2 \ \vdots \ c_n \end{pmatrix} \in \mathbb{F}^n.$$

This is a bijection: every element of $\mathbb{F}^n$ corresponds to exactly one vector in $V$ via $\mathcal{B}$.

Example. In $\mathbb{R}^2$, let $\mathcal{E} = (e_1, e_2)$ be the standard basis and $\mathcal{B} = (b_1, b_2)$ where $b_1 = (1,1)^T$ and $b_2 = (1,-1)^T$. The vector $v = (3, 1)^T$ has standard coordinates $(3,1)$. In basis $\mathcal{B}$:

$$v = c_1 b_1 + c_2 b_2 \implies \begin{pmatrix}3\1\end{pmatrix} = c_1\begin{pmatrix}1\1\end{pmatrix} + c_2\begin{pmatrix}1\-1\end{pmatrix}.$$

Solving: $c_1 = 2$, $c_2 = 1$, so $[v]_{\mathcal{B}} = (2, 1)^T$.

The Change of Basis Matrix

Given two bases $\mathcal{B} = (b_1, \ldots, b_n)$ and $\mathcal{B}' = (b_1', \ldots, b_n')$ for $V$, the change of basis matrix from $\mathcal{B}'$ to $\mathcal{B}$ is the $n \times n$ matrix $P$ whose $j$-th column is $[b_j']_{\mathcal{B}}$:

$$P = \bigl[[b_1']{\mathcal{B}} ;\big|; [b_2']{\mathcal{B}} ;\big|; \cdots ;\big|; [b_n']_{\mathcal{B}}\bigr].$$

Theorem. For any $v \in V$,

$$[v]{\mathcal{B}} = P,[v]{\mathcal{B}'}.$$

Proof. Write $v = \sum_j c_j b_j'$ so $[v]_{\mathcal{B}'} = (c_1, \ldots, c_n)^T$. Then

$$[v]{\mathcal{B}} = \Bigl[\sum_j c_j b_j'\Bigr]{\mathcal{B}} = \sum_j c_j [b_j']{\mathcal{B}} = P,[v]{\mathcal{B}'}. \qquad \square$$

Since $\mathcal{B}'$ is a basis, $P$ is invertible, and $P^{-1}$ converts from $\mathcal{B}$-coordinates to $\mathcal{B}'$-coordinates.

Concrete construction when $V = \mathbb{R}^n$ with standard basis $\mathcal{E}$. If $\mathcal{B}' = (b_1', \ldots, b_n')$ are given as column vectors, then the matrix whose columns are $b_1', \ldots, b_n'$ is the change of basis matrix from $\mathcal{B}'$ to $\mathcal{E}$:

$$P = \begin{pmatrix} | & & | \ b_1' & \cdots & b_n' \ | & & | \end{pmatrix}, \quad [v]{\mathcal{E}} = P,[v]{\mathcal{B}'}.$$

To go the other direction (standard to $\mathcal{B}'$), compute $P^{-1}$.

Old basis B           New basis B'

   b2                 b2'
    |                  /
    |                 /
    |                /  b1'
    +------b1       +------

Coordinates of same vector v differ in each system.
P maps [v]_B' to [v]_B.

How a Linear Map’s Matrix Changes Under Change of Basis

Let $T: V \to V$ be a linear operator. Fix a basis $\mathcal{B}$; write $[T]_{\mathcal{B}}$ for the matrix of $T$ in that basis (columns are the $\mathcal{B}$-coordinates of $T(b_1), \ldots, T(b_n)$).

Now choose a second basis $\mathcal{B}'$ with change-of-basis matrix $P$ (from $\mathcal{B}'$ to $\mathcal{B}$). The matrix of $T$ in $\mathcal{B}'$ is:

$$\boxed{[T]{\mathcal{B}'} = P^{-1}[T]{\mathcal{B}},P.}$$

Proof. For any $v$, we need $[T]{\mathcal{B}'}[v]{\mathcal{B}'} = [Tv]_{\mathcal{B}'}$. We compute:

$$[Tv]{\mathcal{B}'} = P^{-1}[Tv]{\mathcal{B}} = P^{-1}[T]{\mathcal{B}}[v]{\mathcal{B}} = P^{-1}[T]{\mathcal{B}},P,[v]{\mathcal{B}'}. \qquad \square$$

This is the similarity transformation. Two matrices $A$ and $B$ are similar (written $A \sim B$) if there exists an invertible $P$ such that $B = P^{-1}AP$.

Theorem (Axler ยง5.A, reformulated in matrix language). Two $n \times n$ matrices $A$ and $B$ represent the same linear operator on $\mathbb{F}^n$ (in possibly different bases) if and only if $A$ and $B$ are similar.

Proof sketch. If $B = P^{-1}AP$, define the linear operator $T$ with matrix $A$ in the standard basis; then $B = [T]{\mathcal{B}'}$ where $\mathcal{B}'$ has columns of $P$ as its vectors. Conversely, if $A = [T]{\mathcal{B}}$ and $B = [T]_{\mathcal{B}'}$, the formula above gives similarity. $\square$

Similarity is an equivalence relation (reflexive: $A = I^{-1}AI$; symmetric; transitive). The equivalence class of $A$ under similarity is the set of all matrix representations of the same linear map across all bases.

Invariants of similarity. Similar matrices have the same:

  • determinant
  • trace
  • characteristic polynomial
  • eigenvalues (with algebraic multiplicities)
  • rank
  • minimal polynomial

These are intrinsic properties of the linear map, not the matrix.

Diagonalization: Choosing the Best Basis

The “nicest” matrix representation of $T$ is diagonal, because a diagonal matrix $D = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)$ encodes $T(b_i) = \lambda_i b_i$: each basis vector is simply scaled. Powers and exponentials of diagonal matrices are trivial: $D^k = \mathrm{diag}(\lambda_1^k, \ldots, \lambda_n^k)$.

Definition. $T: V \to V$ is diagonalizable if there exists a basis $\mathcal{B}$ of $V$ such that $[T]_{\mathcal{B}}$ is diagonal.

Equivalently (in matrix language), $A$ is diagonalizable if there exists an invertible $P$ and diagonal $D$ with $A = PDP^{-1}$.

Finding a Basis of Eigenvectors

The diagonal entries must be eigenvalues of $T$, and the basis vectors must be eigenvectors.

Theorem (Diagonalizability Criterion). $T \in \mathcal{L}(V)$ (where $\dim V = n$) is diagonalizable over $\mathbb{F}$ if and only if the following two conditions hold:

  1. The characteristic polynomial $p_T(\lambda) = \det([T]_{\mathcal{B}} - \lambda I)$ splits completely over $\mathbb{F}$ (all roots lie in $\mathbb{F}$).
  2. For every eigenvalue $\lambda$, the geometric multiplicity $\dim\ker(T - \lambda I)$ equals the algebraic multiplicity (the multiplicity of $\lambda$ as a root of $p_T$).

Proof sketch. $(\Rightarrow)$ If $T = PDP^{-1}$, the columns of $P$ form a basis of eigenvectors. Each eigenspace $E_\lambda$ is spanned by those columns corresponding to $\lambda$; their count equals the algebraic multiplicity, so geometric = algebraic.

$(\Leftarrow)$ When geometric multiplicity equals algebraic multiplicity for all eigenvalues, the eigenvectors from distinct eigenspaces are linearly independent (proved by induction on the number of eigenvalues), and their total count is $n$, furnishing a complete eigenbasis. $\square$

Procedure to diagonalize $A$:

  1. Compute $p(\lambda) = \det(A - \lambda I)$; find all eigenvalues $\lambda_1, \ldots, \lambda_k$.
  2. For each $\lambda_i$, solve $(A - \lambda_i I)v = 0$; find a basis for $\ker(A - \lambda_i I)$.
  3. Check that each eigenspace dimension equals the algebraic multiplicity of $\lambda_i$.
  4. Concatenate eigenvector bases into columns of $P$; set $D = P^{-1}AP$.

Example. $A = \begin{pmatrix}2&1\0&3\end{pmatrix}$. Eigenvalues: $\lambda_1 = 2, \lambda_2 = 3$ (both algebraic multiplicity 1). Eigenvectors: $(A - 2I)v = 0 \Rightarrow v_1 = (1,0)^T$; $(A - 3I)v = 0 \Rightarrow v_2 = (1,1)^T$. So

$$P = \begin{pmatrix}1&1\0&1\end{pmatrix}, \quad D = \begin{pmatrix}2&0\0&3\end{pmatrix}, \quad A = PDP^{-1}.$$

Jordan Normal Form: When Diagonalization Fails

Over an algebraically closed field (e.g., $\mathbb{C}$), every matrix is similar to a Jordan normal form - a canonical block-diagonal matrix.

A Jordan block of size $m$ for eigenvalue $\lambda$ is:

$$J_m(\lambda) = \begin{pmatrix} \lambda & 1 & & \ & \lambda & 1 & \ & & \ddots & 1 \ & & & \lambda \end{pmatrix} \in \mathbb{F}^{m \times m}.$$

Theorem (Jordan). Over an algebraically closed field, every square matrix $A$ is similar to a block diagonal matrix

$$J = \mathrm{diag}(J_{m_1}(\lambda_1),, J_{m_2}(\lambda_2),, \ldots,, J_{m_r}(\lambda_r)),$$

and this form is unique up to permutation of blocks.

Key facts:

  • $A$ is diagonalizable $\iff$ all Jordan blocks are $1 \times 1$ (i.e., $m_i = 1$ for all $i$).
  • The number of Jordan blocks for $\lambda$ equals $\dim\ker(A - \lambda I)$ (geometric multiplicity).
  • The total size of blocks for $\lambda$ equals the algebraic multiplicity.
  • The size of the largest block for $\lambda$ equals the index of $\lambda$: the smallest $k$ such that $\ker(A - \lambda I)^k = \ker(A - \lambda I)^{k+1}$.

Generalized eigenvectors. When $\dim\ker(A - \lambda I) < $ algebraic multiplicity, we need generalized eigenvectors: vectors in $\ker(A - \lambda I)^k$ for some $k > 1$. These form chains:

$$v_1 \xrightarrow{A - \lambda I} v_2 \xrightarrow{A - \lambda I} \cdots \xrightarrow{A - \lambda I} 0.$$

Jordan form is computationally fragile (small perturbations can merge or split blocks) and mainly useful for theoretical analysis. In practice, Schur decomposition is preferred.

Examples

Diagonal matrices are cheap. Multiplying by $D = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)$ costs $O(n)$ rather than $O(n^2)$. Computing $D^k$ costs $O(n)$ rather than $O(n^3)$ matrix multiplications. For a diagonalizable $A = PDP^{-1}$:

$$A^k = PD^kP^{-1}, \quad e^A = Pe^DP^{-1} = P,\mathrm{diag}(e^{\lambda_1}, \ldots, e^{\lambda_n}),P^{-1}.$$

This makes diagonalization central to solving linear ODEs ($\dot{x} = Ax$) and computing Markov chain steady states.

PCA as change of basis. Principal Component Analysis finds the basis in which the covariance matrix $\Sigma$ of a dataset is diagonal. Since $\Sigma$ is symmetric positive semidefinite, the spectral theorem guarantees real eigenvalues and an orthonormal eigenbasis. The PCA basis vectors (principal components) are the eigenvectors of $\Sigma$, and the change of basis to this eigenbasis “whitens” the data: variance in each direction equals the corresponding eigenvalue.

Original data          After PCA basis change
     *  *                  *   *
  *   *  *              *   *   *
 *  *  * *       ==>    *  *  *
  *   *  *              *   *   *
     *  *                  *   *

Correlated axes         Uncorrelated axes
                        (eigenvectors of Cov)

Numerical note. In floating point, constructing $P^{-1}$ explicitly is expensive and can amplify errors. Libraries like NumPy use numpy.linalg.eig which returns $P$ and $D$ without explicitly forming $P^{-1}$; backsolving is done with LU decomposition.


Read Next: