Helpful context:


Imagine two people navigating the same city. One uses a standard grid map, north-south and east-west. The other uses a map rotated 45 degrees, aligned with the diagonal avenues. They stand at the same corner. They describe the same location. They give completely different coordinates.

Neither is wrong. The city exists independently of either map. The building on that corner does not know or care how you are measuring it. The coordinates are artifacts of the map, not of the city.

This is the central fact of linear algebra that takes time to fully absorb: a vector exists independently of any coordinate system. Its coordinates depend entirely on which basis you choose. Change the basis, change the coordinates - but the vector does not move.

And here is why this matters for mathematics rather than cartography: some problems are brutally hard in one coordinate system and embarrassingly simple in another. A mathematician’s first question upon encountering a linear system is often not “how do I solve this?” but “is there a better coordinate system in which to see it?”

This post is about building that machinery. By the end, you will be able to move freely between coordinate systems, write the transformation law for matrices, and - most importantly - understand why the right change of basis can reduce a complicated matrix to a diagonal one. That last point is the bridge to eigenvalues.


Section 1: Coordinates are Relative

Start in $\mathbb{R}^2$ with the standard basis $\mathcal{E} = \{e_1, e_2\} = \{(1,0)^T, (0,1)^T\}$. The vector $\mathbf{v} = (3, 2)^T$ has standard coordinates $(3, 2)$. This feels natural because the standard basis is so familiar we stop thinking of it as a choice.

Now consider a different basis: $$B = \left\{ \mathbf{b_1}, \mathbf{b_2} \right\} = \left\{ \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ -1 \end{pmatrix} \right\}.$$

These two vectors are linearly independent (you can verify: neither is a scalar multiple of the other), so they span $\mathbb{R}^2$ and form a valid basis. It is a tilted coordinate system. The “horizontal” direction $\mathbf{b_1}$ points diagonally up-right, and the “vertical” direction $\mathbf{b_2}$ points diagonally down-right.

What are the coordinates of $\mathbf{v} = (3,2)^T$ in this basis?

We need scalars $c_1, c_2$ such that: $$c_1 \begin{pmatrix} 1 \\ 1 \end{pmatrix} + c_2 \begin{pmatrix} 1 \\ -1 \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix}.$$

This gives the system: $$c_1 + c_2 = 3$$ $$c_1 - c_2 = 2.$$

Adding the equations: $2c_1 = 5$, so $c_1 = 5/2$. Subtracting: $2c_2 = 1$, so $c_2 = 1/2$.

So in basis $B$, the vector $(3,2)^T$ has coordinates $(5/2, 1/2)^T$. We write $[\mathbf{v}]_B = (5/2, 1/2)^T$.

The coordinates tell you: “to reach this vector from the origin, take $5/2$ steps in the direction $(1,1)^T$ and $1/2$ steps in the direction $(1,-1)^T$.” That is a complete and correct description of the same point $(3,2)^T$ in standard terms. Different numbers, same location in space.

Standard Basis e₁ e₂ 3 2 v v = (3, 2) Basis B b₁ b₂ 5/2 1/2 v [v]_B = (5/2, 1/2)
Same vector, two coordinate systems. Left: standard drop-perpendicular coordinates. Right: parallelogram decomposition along b₁ and b₂.

A note on uniqueness. Because $B$ is a basis - linearly independent and spanning - every vector in $\mathbb{R}^2$ has a unique expression as a linear combination of $\mathbf{b_1}$ and $\mathbf{b_2}$. There is no ambiguity in the $B$-coordinates. This is why “linearly independent + spanning” is exactly the right definition of a basis: independence gives uniqueness, spanning gives existence.


Section 2: The Change of Basis Matrix

Solving a system every time you want to convert coordinates is tedious. We want a matrix that does it automatically.

Let $B = \{\mathbf{b_1}, \mathbf{b_2}, \ldots, \mathbf{b_n}\}$ be a basis for $\mathbb{R}^n$, with the basis vectors expressed in standard coordinates. Form the $n \times n$ matrix $P$ whose columns are the basis vectors:

$$P = \begin{pmatrix} | & | & & | \\ \mathbf{b_1} & \mathbf{b_2} & \cdots & \mathbf{b_n} \\ | & | & & | \end{pmatrix}.$$

For the example above: $$P = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.$$

Why $P$ is invertible. First, the definition: vectors $\mathbf{u}_1, \ldots, \mathbf{u}_n \in \mathbb{R}^n$ are linearly independent if the only solution to $c_1\mathbf{u}_1 + \cdots + c_n\mathbf{u}_n = \mathbf{0}$ is $c_1 = \cdots = c_n = 0$. (For two vectors, this is equivalent to neither being a scalar multiple of the other - the check we did for $\mathbf{b_1}, \mathbf{b_2}$ in Section 1.) The basis vectors satisfy this by assumption.

We now prove: if the $n$ columns of a square matrix $P \in \mathbb{R}^{n \times n}$ are linearly independent, then $P$ is invertible.

Proof. Suppose $P\mathbf{x} = \mathbf{0}$ for some $\mathbf{x} = (x_1, \ldots, x_n)^T$. By definition of matrix-vector multiplication, $P\mathbf{x} = x_1\mathbf{b_1} + \cdots + x_n\mathbf{b_n}$. Linear independence of the columns forces $x_1 = \cdots = x_n = 0$, so $\mathbf{x} = \mathbf{0}$.

Now apply Gaussian elimination to $P$. If any column were a free-variable column (not a pivot column), we could set that variable to $1$ and solve for the remaining variables by back-substitution, producing a non-zero solution to $P\mathbf{x} = \mathbf{0}$ - contradicting what we just showed. Therefore every column of $P$ is a pivot column. Since $P$ is $n \times n$, all $n$ columns are pivot columns: elimination produces $n$ pivots with no zero rows, and further row-reduction gives the reduced row echelon form $I_n$.

It remains to construct $P^{-1}$ explicitly. The $j$-th column of $P^{-1}$ must satisfy $P\mathbf{x} = \mathbf{e}_j$ (the $j$-th standard basis vector), since $P \cdot P^{-1} = I$ means $P$ times the $j$-th column of $P^{-1}$ is $\mathbf{e}_j$. We need to solve $n$ such systems, one for each $j$. Since all $n$ systems share the same coefficient matrix $P$, we can solve them simultaneously: augment $P$ with all right-hand sides at once to form $[P \mid \mathbf{e}_1 \mid \cdots \mid \mathbf{e}_n] = [P \mid I]$. Gaussian elimination acts on rows of the entire augmented matrix, reducing the left block to $I_n$. Each row operation is left-multiplication by an elementary matrix; if the sequence of operations is $E_k \cdots E_1$, then $E_k \cdots E_1 P = I$, so the right block becomes $E_k \cdots E_1 I = E_k \cdots E_1 = P^{-1}$. $\square$

The conversion formulas. There are two directions:

  1. From $B$-coordinates to standard coordinates. If $[\mathbf{v}]_B = (c_1, \ldots, c_n)^T$, then: $$\mathbf{v} = c_1 \mathbf{b_1} + \cdots + c_n \mathbf{b_n} = P [\mathbf{v}]_B.$$ The matrix $P$ translates $B$-coordinates into standard coordinates.

  2. From standard coordinates to $B$-coordinates. Apply $P^{-1}$ to both sides: $$[\mathbf{v}]_B = P^{-1} \mathbf{v}.$$ The matrix $P^{-1}$ translates standard coordinates into $B$-coordinates.

Verification with the example. We found $[\mathbf{v}]_B = (5/2, 1/2)^T$ by hand. Let us confirm with $P^{-1}$. For a $2 \times 2$ matrix $P = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$, the inverse is $\frac{1}{ad - bc}\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$.

Here $\det(P) = (1)(-1) - (1)(1) = -2$, so: $$P^{-1} = \frac{1}{-2}\begin{pmatrix} -1 & -1 \\ -1 & 1 \end{pmatrix} = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{pmatrix}.$$

Then: $$P^{-1} \mathbf{v} = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{pmatrix} \begin{pmatrix} 3 \\ 2 \end{pmatrix} = \begin{pmatrix} 5/2 \\ 1/2 \end{pmatrix}.$$ ✓

A 3D Example

Let $B = \{\mathbf{b_1}, \mathbf{b_2}, \mathbf{b_3}\}$ in $\mathbb{R}^3$ with: $$\mathbf{b_1} = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}, \quad \mathbf{b_2} = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}, \quad \mathbf{b_3} = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}.$$

The change of basis matrix is: $$P = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 0 \end{pmatrix}.$$

Let $\mathbf{v} = (2, 3, 1)^T$ in standard coordinates. To find $[\mathbf{v}]_B$, we solve $P [\mathbf{v}]_B = \mathbf{v}$, i.e., we apply $P^{-1}$. Using row reduction on the augmented matrix:

$$\left(\begin{array}{ccc|c} 1 & 0 & 1 & 2 \\ 0 & 1 & 1 & 3 \\ 1 & 1 & 0 & 1 \end{array}\right) \xrightarrow{R_3 \leftarrow R_3 - R_1} \left(\begin{array}{ccc|c} 1 & 0 & 1 & 2 \\ 0 & 1 & 1 & 3 \\ 0 & 1 & -1 & -1 \end{array}\right) \xrightarrow{R_3 \leftarrow R_3 - R_2} \left(\begin{array}{ccc|c} 1 & 0 & 1 & 2 \\ 0 & 1 & 1 & 3 \\ 0 & 0 & -2 & -4 \end{array}\right).$$

From the last row: $c_3 = 2$. Back-substituting: $c_2 + 2 = 3 \Rightarrow c_2 = 1$. And $c_1 + 2 = 2 \Rightarrow c_1 = 0$.

So $[\mathbf{v}]_B = (0, 1, 2)^T$. Verify: $0 \cdot \mathbf{b_1} + 1 \cdot \mathbf{b_2} + 2 \cdot \mathbf{b_3} = (0,0,0)^T + (0,1,1)^T + (2,2,0)^T = (2,3,1)^T$. ✓

Converting Between Two Non-Standard Bases

Suppose you have two non-standard bases $B$ and $B'$ for $\mathbb{R}^n$, with change-of-basis matrices $P$ (for $B$) and $Q$ (for $B'$), both relative to the standard basis.

  • Standard to $B$-coordinates: $P^{-1}$
  • Standard to $B'$-coordinates: $Q^{-1}$
  • $B$-coordinates to $B'$-coordinates: go through standard: $Q^{-1} P$

The matrix $Q^{-1} P$ directly converts $B$-coordinates to $B'$-coordinates. This is the general change-of-basis formula when neither basis is the standard one.


Section 3: How Linear Transformations Change Under Change of Basis

We have resolved coordinates. Now for the deeper question.

A linear transformation $T: \mathbb{R}^n \to \mathbb{R}^n$ has a matrix $A$ in the standard basis. If you change to a new basis $B$ with change-of-basis matrix $P$, what is the matrix of $T$ in basis $B$?

Call this matrix $A_B$. By definition, $A_B$ is the matrix such that: if you give it the $B$-coordinates of a vector $\mathbf{v}$, it produces the $B$-coordinates of $T(\mathbf{v})$.

Here is how to compute it. Work out what must happen step by step:

  1. You start with $[\mathbf{v}]_B$, the $B$-coordinates of $\mathbf{v}$.
  2. Convert to standard coordinates: multiply by $P$, getting $\mathbf{v} = P[\mathbf{v}]_B$.
  3. Apply $T$ in standard coordinates: multiply by $A$, getting $A\mathbf{v} = AP[\mathbf{v}]_B$.
  4. Convert the output back to $B$-coordinates: multiply by $P^{-1}$, getting $P^{-1}AP[\mathbf{v}]_B$.

The matrix that takes $[\mathbf{v}]_B$ and produces $[T(\mathbf{v})]_B$ is therefore: $$\boxed{A_B = P^{-1}AP.}$$

Discomfort check. Why $P^{-1}AP$ and not $PAP^{-1}$? The order looks backwards. The key is that matrix multiplication reads right-to-left: when you write $P^{-1}AP$, the rightmost $P$ acts first. Step by step: $P$ converts from $B$ to standard, then $A$ applies the transformation in standard, then $P^{-1}$ converts back to $B$. So the order of operations (left to right in time) is $P$, then $A$, then $P^{-1}$ - which in matrix notation is written $P^{-1}(A(P [\mathbf{v}]_B))$, i.e., $P^{-1}AP$ acting on $[\mathbf{v}]_B$. If it helps, say it aloud: “undo-$B$ first, then standard transformation, then go back-to-$B$.”

A Worked Example

Let $A = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}$ and basis $B = \left\{ \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \end{pmatrix} \right\}$.

The change-of-basis matrix is: $$P = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \quad P^{-1} = \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix}.$$

(Check: $PP^{-1} = \begin{pmatrix}1&-1\\0&1\end{pmatrix}\begin{pmatrix}1&1\\0&1\end{pmatrix}$… actually compute $P^{-1}P$: $\begin{pmatrix}1&-1\\0&1\end{pmatrix}\begin{pmatrix}1&1\\0&1\end{pmatrix} = \begin{pmatrix}1&0\\0&1\end{pmatrix}$. ✓)

Now compute $A_B = P^{-1}AP$. First, $AP$: $$AP = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 3 \\ 0 & 3 \end{pmatrix}.$$

Then $P^{-1}(AP)$: $$P^{-1}AP = \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 2 & 3 \\ 0 & 3 \end{pmatrix} = \begin{pmatrix} 2 & 0 \\ 0 & 3 \end{pmatrix}.$$

The transformation $T$ with standard matrix $A = \begin{pmatrix}2&1\\0&3\end{pmatrix}$ is, in basis $B$, represented by the diagonal matrix $\begin{pmatrix}2&0\\0&3\end{pmatrix}$.

This is remarkable. The matrix $A$ had an off-diagonal entry in standard coordinates. In basis $B$, it is diagonal: it simply scales the first basis vector by $2$ and the second basis vector by $3$. The transformation has not changed. We just chose a better coordinate system to describe it.

Discomfort check. The same transformation has a complicated matrix in one basis and a simple matrix in another. How can that be? The transformation $T$ is an intrinsic object - it maps vectors to vectors, period. A matrix is merely a description of that transformation in a chosen coordinate system. The description depends on the coordinate system you use to record inputs and outputs. Two different descriptions can look very different even though they describe the same underlying function - just as “the building two blocks north and one block east” and “the building three blocks down the diagonal avenue” describe the same building.


Section 4: Similarity Transformations

The relationship $A_B = P^{-1}AP$ is so important it gets a name.

Definition. Two $n \times n$ matrices $A$ and $C$ are similar if there exists an invertible matrix $P$ such that: $$C = P^{-1}AP.$$

We write $A \sim C$. The operation of passing from $A$ to $P^{-1}AP$ is called a similarity transformation.

The geometric meaning: $A$ and $C$ are similar if and only if they represent the same linear transformation in different bases. $P$ is the change-of-basis matrix between those bases.

Similarity is an equivalence relation. Verify the three properties:

  • Reflexive: $A = I^{-1}AI$, so $A \sim A$.
  • Symmetric: if $C = P^{-1}AP$, then $A = (P^{-1})^{-1}C(P^{-1}) = PCP^{-1}$, so $A \sim C \Rightarrow C \sim A$.
  • Transitive: if $C = P^{-1}AP$ and $D = Q^{-1}CQ$, then $D = Q^{-1}P^{-1}APQ = (PQ)^{-1}A(PQ)$, so $A \sim C, C \sim D \Rightarrow A \sim D$.

Each equivalence class under similarity is the set of all matrices representing the same linear transformation across all possible bases.

Invariants Under Similarity

Because similar matrices represent the same linear transformation, any quantity that captures an intrinsic property of the transformation must be preserved by similarity. These are called similarity invariants.

Determinant. $\det(P^{-1}AP) = \det(P^{-1})\det(A)\det(P) = \frac{1}{\det(P)} \det(A) \det(P) = \det(A)$.

Trace. Using the cyclic property of trace, $\text{tr}(ABC) = \text{tr}(CAB) = \text{tr}(BCA)$: $$\text{tr}(P^{-1}AP) = \text{tr}(APP^{-1}) = \text{tr}(A).$$

Eigenvalues. If $Av = \lambda v$, does $P^{-1}AP$ have the same eigenvalues? Yes: the characteristic polynomial $\det(A - \lambda I)$ is a similarity invariant (since $\det(P^{-1}AP - \lambda I) = \det(P^{-1}(A - \lambda I)P) = \det(A - \lambda I)$). Same characteristic polynomial means same eigenvalues with the same algebraic multiplicities.

Rank. Similar matrices have the same rank, since invertible matrices preserve rank.

Minimal polynomial. The minimal polynomial is also a similarity invariant.

The table of invariants:

Invariant Why it’s preserved
Determinant Multiplicativity of $\det$
Trace Cyclic property of $\text{tr}$
Eigenvalues Same characteristic polynomial
Rank Invertible maps preserve rank
Minimal polynomial Annihilates same underlying operator

These invariants are properties of the linear transformation, not of any particular matrix representation. They are independent of the coordinate system you use to describe the transformation.


Section 5: Diagonalization - The Goal of Change of Basis

Now we arrive at the central application. Among all possible matrix representations of a linear transformation $T$, the simplest is a diagonal matrix: $$D = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{pmatrix}.$$

A diagonal matrix acts on each basis vector independently: it simply scales $\mathbf{b_j}$ by $\lambda_j$. No coupling between directions, no off-diagonal interactions.

Definition. A matrix $A$ is diagonalizable if it is similar to a diagonal matrix: $$A = P D P^{-1},$$ or equivalently, $D = P^{-1}AP$, where $D$ is diagonal.

This says: there exists a basis (the columns of $P$) in which $A$ acts by scaling each basis vector by the corresponding diagonal entry.

What are those scaling factors? What are those basis vectors?

  • The scaling factors $\lambda_j$ are the eigenvalues of $A$: the numbers such that $A\mathbf{b_j} = \lambda_j \mathbf{b_j}$.
  • The basis vectors $\mathbf{b_j}$ (the columns of $P$) are the eigenvectors of $A$: the directions that $A$ does not rotate, only scales.

The question of when such a basis exists - when $A$ is diagonalizable - is exactly the question of whether $A$ has enough eigenvectors to form a complete basis. That is the central topic of the next post. But the conceptual setup is already here: diagonalization is nothing more than finding the right change of basis.

Why Diagonalization Matters

Once you have $A = PDP^{-1}$, many computations become trivial:

Matrix powers. To compute $A^k$: $$A^2 = (PDP^{-1})(PDP^{-1}) = PD(P^{-1}P)DP^{-1} = PD^2P^{-1}.$$ By induction: $A^k = PD^kP^{-1}$.

And $D^k$ is immediate: $$D^k = \begin{pmatrix} \lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k \end{pmatrix}.$$

Computing $A^{100}$ via repeated matrix multiplication requires $99$ matrix multiplications, each $O(n^3)$. Via diagonalization: raise $n$ scalars to the 100th power, cost $O(n)$.

Matrix exponential. The matrix exponential $e^A = \sum_{k=0}^\infty \frac{A^k}{k!}$ appears in the solution of linear differential equations $\mathbf{x}'(t) = A\mathbf{x}(t)$ (solution: $\mathbf{x}(t) = e^{tA}\mathbf{x}(0)$). With diagonalization: $$e^{tA} = P e^{tD} P^{-1} = P \begin{pmatrix} e^{\lambda_1 t} & & \\ & \ddots & \\ & & e^{\lambda_n t} \end{pmatrix} P^{-1}.$$

Long-run behavior. For an iterative map $\mathbf{x_{k+1}} = A\mathbf{x_k}$, we have $\mathbf{x_k} = A^k \mathbf{x_0} = PD^kP^{-1}\mathbf{x_0}$. The long-run behavior is dominated by the eigenvalue of largest magnitude. The eigenvector decomposition of $\mathbf{x}_0$ tells you exactly how each component grows or decays.

A Preview: When Does Diagonalization Succeed?

The worked example in Section 3 showed that the matrix $A = \begin{pmatrix}2&1\\0&3\end{pmatrix}$ is diagonal in the basis $\left\{\begin{pmatrix}1\\0\end{pmatrix}, \begin{pmatrix}1\\1\end{pmatrix}\right\}$. Notice that $A\begin{pmatrix}1\\0\end{pmatrix} = \begin{pmatrix}2\\0\end{pmatrix} = 2\begin{pmatrix}1\\0\end{pmatrix}$ and $A\begin{pmatrix}1\\1\end{pmatrix} = \begin{pmatrix}3\\3\end{pmatrix} = 3\begin{pmatrix}1\\1\end{pmatrix}$. These basis vectors are precisely the eigenvectors of $A$.

This is not a coincidence. It is the theorem: $A = PDP^{-1}$ if and only if the columns of $P$ are eigenvectors of $A$. The eigenvalues fill the diagonal of $D$.

Finding the right basis for diagonalization is finding the eigenvectors. That is what the next post is about - and it also contains a detailed proof of $AP = PD$ via the column interpretation of matrix multiplication.


Section 6: Non-Square Transformations and the SVD

Everything above assumed $A$ is square: the input and output spaces have the same dimension. For non-square $A \in \mathbb{R}^{m \times n}$ (mapping $\mathbb{R}^n$ to $\mathbb{R}^m$), the formula $P^{-1}AP$ does not make sense: $P$ would need to be $n \times n$ on the right but $m \times m$ on the left, and there is no single $P$ that works on both sides when $m \neq n$.

The correct generalization is: choose a basis for the input space $\mathbb{R}^n$ and a (different) basis for the output space $\mathbb{R}^m$. Use different change-of-basis matrices on each side.

If you choose the right bases, the matrix of $A$ becomes diagonal - a $m \times n$ matrix with nonzero entries only on the main diagonal. This is the Singular Value Decomposition: $$A = U \Sigma V^T,$$ where $V \in \mathbb{R}^{n \times n}$ is an orthogonal matrix (the input basis), $U \in \mathbb{R}^{m \times m}$ is an orthogonal matrix (the output basis), and $\Sigma \in \mathbb{R}^{m \times n}$ is diagonal with nonnegative entries $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$ (the singular values). The SVD post covers this construction in depth.


Section 7: Applications

Change of basis is not an abstract technique confined to mathematics. It appears throughout science and engineering precisely because the right coordinate system can transform a hard problem into a transparent one.

Principal Component Analysis (PCA). A dataset $X \in \mathbb{R}^{n \times p}$ (n observations, p features) has a sample covariance matrix $\Sigma \in \mathbb{R}^{p \times p}$. Because $\Sigma$ is symmetric positive semidefinite, it has a real orthonormal eigenbasis (this is the Spectral Theorem). The eigenvectors are the principal components - directions of maximum variance.

Expressing the data in the principal component basis is a change of basis: $X_{\text{new}} = X V$, where $V$ contains the eigenvectors as columns. In the new basis, the covariance matrix is diagonal: the features are uncorrelated. The change of basis has revealed the structure hidden in the original data.

Normal modes in physics. A system of $n$ coupled oscillators (masses connected by springs) satisfies a linear ODE system $\ddot{\mathbf{x}} = -K\mathbf{x}$, where $K$ is a positive definite matrix encoding the coupling. In the original coordinates the equations are coupled - each mass affects every other.

But in the eigenvector basis of $K$, the matrix becomes diagonal. The coupled system decomposes into $n$ independent oscillators, each with its own natural frequency $\omega_i = \sqrt{\lambda_i}$. These are the normal modes: collective motions of the whole system in which every part oscillates at the same frequency. The change of basis decouples the problem.

Discrete Fourier Transform. A periodic signal $\mathbf{x} \in \mathbb{C}^n$ can be expressed in the standard basis (time samples at $t = 0, 1, \ldots, n-1$). But the natural basis for periodic signals is the Fourier basis: $\{1, e^{2\pi i k / n} : k = 0, \ldots, n-1\}$.

The DFT matrix $F$ is the change-of-basis matrix from the time-domain basis to the Fourier basis. The DFT is: $$\hat{\mathbf{x}} = F \mathbf{x}, \quad F_{jk} = \frac{1}{\sqrt{n}} e^{-2\pi i jk / n}.$$

In the Fourier basis, convolution becomes pointwise multiplication - a radical simplification that underlies all of signal processing.

Computer graphics. A 3D object lives in model space. Placing it in a scene moves it to world space (a rigid body transformation). The camera’s perspective moves it to camera space. Projecting to a 2D screen moves it to screen space. Each step is a change of coordinates (or affine transformation, which is a linear transformation plus a translation). The rendering pipeline is, at its heart, a sequence of basis changes.


Section 8: The Rigorous Underpinning

For a reader who wants the full abstract framework.

Abstract Vector Spaces

Let $V$ be a finite-dimensional vector space with two ordered bases:

$$\mathcal{B} = (\mathbf{b}_1, \ldots, \mathbf{b}_n) \qquad \text{and} \qquad \mathcal{B}' = (\mathbf{b}'_1, \ldots, \mathbf{b}'_n).$$

Definition. The change of basis matrix from $\mathcal{B}'$ to $\mathcal{B}$ is the $n \times n$ matrix $P$ whose $j$-th column is the coordinate vector of $\mathbf{b}'_j$ expressed in the basis $\mathcal{B}$.

Theorem. For any $v \in V$:

$$[v]_{\mathcal{B}} = P\, [v]_{\mathcal{B}'}.$$

Proof. Write $v$ in the $\mathcal{B}'$ basis with coordinates $(\alpha_1, \ldots, \alpha_n)^T$:

$$v = \alpha_1 \mathbf{b}'_1 + \cdots + \alpha_n \mathbf{b}'_n.$$

Then:

$$[v]_{\mathcal{B}} = \left[\sum_j \alpha_j \mathbf{b}'_j\right]_{\mathcal{B}} = \sum_j \alpha_j [\mathbf{b}'_j]_{\mathcal{B}} = P\,[v]_{\mathcal{B}'},$$

where the last step uses linearity of the coordinate map and the definition of matrix-vector multiplication. $\square$

The Transformation Law for Linear Maps

Let $T: V \to V$ be a linear map. Write $[T]_{\mathcal{B}}$ for the matrix of $T$ in basis $\mathcal{B}$.

Theorem. If $P$ is the change of basis matrix from $\mathcal{B}'$ to $\mathcal{B}$, then:

$$[T]_{\mathcal{B}'} = P^{-1} [T]_{\mathcal{B}} P.$$

Proof. For any $v \in V$, apply the coordinate maps step by step:

$$[T]_{\mathcal{B}'} [v]_{\mathcal{B}'} = [Tv]_{\mathcal{B}'} = P^{-1}[Tv]_{\mathcal{B}} = P^{-1}[T]_{\mathcal{B}}[v]_{\mathcal{B}} = P^{-1}[T]_{\mathcal{B}} P [v]_{\mathcal{B}'}.$$

Since this holds for all $v$, the matrices on both sides agree:

$$[T]_{\mathcal{B}'} = P^{-1}[T]_{\mathcal{B}}P. \quad \square$$

Proofs of Trace and Determinant Invariance

Trace invariance. The trace satisfies the cyclic property: $\text{tr}(XY) = \text{tr}(YX)$ for any matrices $X, Y$ where both products are defined. Applying this with $X = P^{-1}A$ and $Y = P$: $$\text{tr}(P^{-1}AP) = \text{tr}((P^{-1}A)P) = \text{tr}(P(P^{-1}A)) = \text{tr}(A).$$

Determinant invariance. The determinant is multiplicative: $\det(XY) = \det(X)\det(Y)$. Therefore: $$\det(P^{-1}AP) = \det(P^{-1})\det(A)\det(P) = \frac{1}{\det(P)} \cdot \det(A) \cdot \det(P) = \det(A).$$

Jordan Normal Form

Not every matrix is diagonalizable over $\mathbb{R}$, and even over $\mathbb{C}$ not every matrix is diagonalizable. But there is always a “best possible” simplification.

Theorem (Jordan). Over $\mathbb{C}$, every $n \times n$ matrix $A$ is similar to a Jordan normal form: $$J = \text{diag}(J_{m_1}(\lambda_1), J_{m_2}(\lambda_2), \ldots, J_{m_r}(\lambda_r)),$$ where each Jordan block is: $$J_m(\lambda) = \begin{pmatrix} \lambda & 1 & & \\ & \lambda & \ddots & \\ & & \ddots & 1 \\ & & & \lambda \end{pmatrix} \in \mathbb{C}^{m \times m}.$$ This form is unique up to reordering of the blocks.

Key facts:

  • $A$ is diagonalizable $\iff$ all Jordan blocks are $1 \times 1$.
  • The number of Jordan blocks for eigenvalue $\lambda$ equals the geometric multiplicity $\dim \ker(A - \lambda I)$.
  • The sum of block sizes for $\lambda$ equals the algebraic multiplicity of $\lambda$.
  • Jordan blocks arise when the geometric multiplicity is strictly less than the algebraic multiplicity - i.e., when there are not enough eigenvectors.

When $J_m(\lambda)$ has $m > 1$, the extra basis vectors needed are generalized eigenvectors: vectors in $\ker(A - \lambda I)^k$ for $k > 1$ that are not in $\ker(A - \lambda I)^{k-1}$. They form chains: $$(A - \lambda I) v_k = v_{k-1}, \quad (A - \lambda I) v_{k-1} = v_{k-2}, \quad \ldots, \quad (A - \lambda I) v_1 = 0.$$

Jordan normal form is the most general “simplification by change of basis.” In practice, constructing $P$ explicitly is numerically unstable - small perturbations can drastically change the Jordan structure. For computational work, the Schur decomposition ($A = QUQ^*$ with $Q$ unitary and $U$ upper triangular) is preferred; it exists for all matrices and is numerically robust.


Summary

Concept Meaning Formula
Change of coordinates Convert $\mathbf{v}$ from standard to basis $B$ $[\mathbf{v}]_B = P^{-1}\mathbf{v}$
Change of basis matrix Columns are new basis vectors in old coordinates $P = [\mathbf{b_1} \mid \cdots \mid \mathbf{b_n}]$
Similarity transformation Same map, different basis $A_B = P^{-1}AP$
Diagonalization Find basis where map is just scaling $A = PDP^{-1}$
Matrix powers Cheap once diagonalized $A^k = PD^kP^{-1}$
Invariants under similarity Properties of the map itself Eigenvalues, det, trace, rank

The key progression in this post:

  1. Coordinates depend on the basis you choose.
  2. The matrix $P$ (columns = basis vectors) converts between bases.
  3. A transformation $A$ in one basis becomes $P^{-1}AP$ in another.
  4. Similar matrices represent the same transformation and share all intrinsic properties.
  5. The best case is when $P^{-1}AP$ is diagonal - this means the columns of $P$ are eigenvectors and the diagonal entries are eigenvalues.
  6. Finding that $P$ is the eigenvalue problem.

Read next: