Linear Transformations // Megha Bose

Prerequisite:

Vector Spaces & Subspaces

Definition and First Consequences

Let $V$ and $W$ be vector spaces over $\mathbb{F}$. A function $T: V \to W$ is a linear map (linear transformation) if it satisfies both:

Additivity: $T(u + v) = T(u) + T(v)$ for all $u, v \in V$
Homogeneity: $T(\alpha v) = \alpha T(v)$ for all $\alpha \in \mathbb{F}$, $v \in V$

These two conditions collapse into one: $T(\alpha u + \beta v) = \alpha T(u) + \beta T(v)$, i.e., $T$ preserves linear combinations. Equivalently, $T$ preserves the vector-space structure.

Immediate consequences. Every linear map satisfies:

$T(\mathbf{0}_V) = \mathbf{0}_W$ (set $\alpha = 0$)
$T(-v) = -T(v)$
$T!\left(\sum_{i=1}^k \alpha_i v_i\right) = \sum_{i=1}^k \alpha_i T(v_i)$

The last point is crucial: a linear map is completely determined by its values on a basis. If ${e_1, \ldots, e_n}$ is a basis for $V$ and we specify $T(e_i) \in W$ arbitrarily, there is a unique linear map extending these values.

Examples.

$T: \mathbb{R}^n \to \mathbb{R}^m$, $T(\mathbf{x}) = A\mathbf{x}$ for any matrix $A \in \mathbb{R}^{m \times n}$.
Differentiation $D: \mathcal{P}n \to \mathcal{P}{n-1}$, $D(p) = p'$ is linear.
Integration $I: C([a,b]) \to \mathbb{R}$, $I(f) = \int_a^b f(x),dx$ is linear.
The zero map $T(v) = \mathbf{0}$ and the identity $I(v) = v$ are both linear.

Null Space and Range

Definition. For $T: V \to W$:

The null space (kernel) is $\text{null}(T) = {v \in V : T(v) = \mathbf{0}}$
The range (image) is $\text{range}(T) = {T(v) : v \in V}$

Theorem. $\text{null}(T)$ is a subspace of $V$ and $\text{range}(T)$ is a subspace of $W$.

Proof. For $\text{null}(T)$: $T(\mathbf{0}) = \mathbf{0}$, so $\mathbf{0} \in \text{null}(T)$. If $u, v \in \text{null}(T)$ and $\alpha \in \mathbb{F}$, then $T(\alpha u + v) = \alpha T(u) + T(v) = \mathbf{0}$, so $\alpha u + v \in \text{null}(T)$. Identical argument for $\text{range}(T)$.

Terminology. $\dim(\text{null}(T))$ is the nullity of $T$; $\dim(\text{range}(T))$ is the rank of $T$.

The Rank-Nullity Theorem

Theorem (Rank-Nullity / Fundamental Theorem of Linear Maps). Let $V$ be finite-dimensional and $T: V \to W$ linear. Then:

$$\dim(\text{null}(T)) + \dim(\text{range}(T)) = \dim(V)$$

Proof. Let $\dim(V) = n$ and $\dim(\text{null}(T)) = k$. Choose a basis $u_1, \ldots, u_k$ for $\text{null}(T)$. Extend it to a basis $u_1, \ldots, u_k, v_1, \ldots, v_{n-k}$ of $V$ (this is possible by the basis extension theorem).

Claim: $T(v_1), \ldots, T(v_{n-k})$ is a basis for $\text{range}(T)$.

Spanning. For any $w \in \text{range}(T)$, write $w = T(v)$ for some $v \in V$. Express $v = \sum_{i=1}^k \alpha_i u_i + \sum_{j=1}^{n-k} \beta_j v_j$. Then $w = T(v) = \sum_i \alpha_i T(u_i) + \sum_j \beta_j T(v_j) = \sum_j \beta_j T(v_j)$, since $T(u_i) = \mathbf{0}$.

Independence. Suppose $\sum_{j=1}^{n-k} \beta_j T(v_j) = \mathbf{0}$. Then $T!\left(\sum_j \beta_j v_j\right) = \mathbf{0}$, so $\sum_j \beta_j v_j \in \text{null}(T)$. Write $\sum_j \beta_j v_j = \sum_i \alpha_i u_i$. Then $\sum_i \alpha_i u_i - \sum_j \beta_j v_j = \mathbf{0}$. Since $u_1, \ldots, u_k, v_1, \ldots, v_{n-k}$ is a basis of $V$, all coefficients are zero: $\alpha_i = 0$ and $\beta_j = 0$.

Thus $T(v_1), \ldots, T(v_{n-k})$ is a basis for $\text{range}(T)$, which has dimension $n - k$. $\square$

Corollaries. For $T: V \to W$ with $\dim(V) = \dim(W) = n$:

$T$ is injective iff $T$ is surjective iff $T$ is bijective (no such shortcut in infinite dimensions).
Rank-nullity is the precise statement of “you can’t get information for free”: every dimension killed by $T$ (added to the null space) is a dimension lost from the range.

Rank-Nullity picture for T: R^4 -> R^3, rank=2, nullity=2:

 V = R^4 (dim 4)          W = R^3 (dim 3)
 +------------------+     +-------------+
 |  null(T)         |     |             |
 |  (dim 2)         |---->|  range(T)   |
 |                  |  T  |  (dim 2)    |
 |  complement      |     |             |
 |  (dim 2)   ------+---->|             |
 +------------------+     +-------------+

Injectivity, Surjectivity, Isomorphism

Theorem. $T: V \to W$ is injective if and only if $\text{null}(T) = {\mathbf{0}}$.

Proof. ($\Rightarrow$) If $T(v) = \mathbf{0} = T(\mathbf{0})$ and $T$ is injective, then $v = \mathbf{0}$. ($\Leftarrow$) If $T(u) = T(v)$, then $T(u-v) = \mathbf{0}$, so $u - v \in \text{null}(T) = {\mathbf{0}}$, giving $u = v$.

Definition. An isomorphism is a bijective linear map. If an isomorphism $T: V \to W$ exists, $V$ and $W$ are isomorphic, written $V \cong W$.

Theorem. Two finite-dimensional vector spaces over $\mathbb{F}$ are isomorphic iff they have the same dimension.

Proof. If $\dim(V) = \dim(W) = n$, let $e_1, \ldots, e_n$ and $f_1, \ldots, f_n$ be bases. Define $T(e_i) = f_i$ and extend linearly. This $T$ is clearly a bijection between bases, hence an isomorphism.

In particular, $V \cong \mathbb{F}^n$ for any $n$-dimensional $V$ over $\mathbb{F}$. This is why $\mathbb{R}^n$ is the model for all finite-dimensional real vector spaces.

Matrix Representation

Fix bases $\mathcal{B} = (e_1, \ldots, e_n)$ of $V$ and $\mathcal{C} = (f_1, \ldots, f_m)$ of $W$. For $T: V \to W$, define the matrix of $T$ with respect to $\mathcal{B}, \mathcal{C}$ as the matrix $\mathcal{M}(T) \in \mathbb{F}^{m \times n}$ whose $j$-th column contains the coordinates of $T(e_j)$ in the basis $\mathcal{C}$:

$$T(e_j) = \sum_{i=1}^m \mathcal{M}(T)_{ij} f_i$$

Then for any $v = \sum_j \alpha_j e_j$, we have $T(v) = \mathcal{M}(T) \cdot [\mathbf{v}]_{\mathcal{B}}$ (in coordinates). Different bases give different matrices for the same transformation - they are related by change-of-basis matrices.

Theorem. $\mathcal{M}(ST) = \mathcal{M}(S)\mathcal{M}(T)$ for composable linear maps $S, T$. This is precisely why matrix multiplication is defined the way it is - it encodes composition of linear maps.

The Space $\mathcal{L}(V, W)$

The set of all linear maps from $V$ to $W$ is denoted $\mathcal{L}(V, W)$. It is itself a vector space under pointwise operations: $(S + T)(v) = S(v) + T(v)$ and $(\alpha T)(v) = \alpha T(v)$.

Theorem. $\dim(\mathcal{L}(V, W)) = \dim(V) \cdot \dim(W)$.

Proof. $\mathcal{L}(V,W) \cong \mathbb{F}^{m \times n}$ via the matrix representation (once bases are chosen), and $\dim(\mathbb{F}^{m \times n}) = mn$.

Examples

Linear layers in neural networks. A fully-connected layer computes $\mathbf{y} = W\mathbf{x} + \mathbf{b}$. The linear part $\mathbf{x} \mapsto W\mathbf{x}$ is an element of $\mathcal{L}(\mathbb{R}^{d_{\text{in}}}, \mathbb{R}^{d_{\text{out}}})$. The rank-nullity theorem says: if $\text{rank}(W) = r < d_{\text{in}}$, then the layer destroys $d_{\text{in}} - r$ dimensions of information irreversibly. This is why low-rank weight matrices (LoRA adapters in LLMs) can be expressive despite having far fewer parameters.

Convolutions as linear maps. A discrete convolution with kernel $k$ is a linear map on sequences or images. It is shift-equivariant (a special structure), but as a map $\mathbb{R}^n \to \mathbb{R}^m$ it is still just an element of $\mathcal{L}(\mathbb{R}^n, \mathbb{R}^m)$, representable by a (sparse, structured) matrix. The rank of this matrix determines how much information a convolutional layer can transmit.

Backpropagation. The Jacobian of a differentiable function at a point is the matrix representation of its best linear approximation (the derivative as a linear map). Backpropagation chains these Jacobians via the chain rule - which is exactly the composition law $\mathcal{M}(ST) = \mathcal{M}(S)\mathcal{M}(T)$ for linear maps.

Read Next: