Mappings & Linear Maps - Every Matrix Is a Transformation // Megha Bose

Helpful context:

In Functions & Mappings - One Input, One Output, No Exceptions , functions are machines: one input in, one output out. That framing handles everything about domains, injectivity, inverses, and function families cleanly.

But the same mathematical object admits a different reading. Instead of asking what the function does to a single input, ask what it does to an entire space. The function $f(x) = 2x$ is not just sending $3$ to $6$ - it is stretching the whole real line. When the domain is $\mathbb{R}^3$ and the codomain is $\mathbb{R}^2$, you are not processing individual vectors; you are compressing three-dimensional space into a plane.

This shift - from function-as-rule to map-as-transformation - is where linear algebra lives. This post covers the mapping perspective, structure-preserving maps, linear maps, and why every matrix is exactly a linear transformation encoded in coordinates.

Section 1: The Mapping Perspective

You have been thinking about functions as machines: put something in, get something out. This is the right way to think when you are asking what does $f$ do to the number $3$.

But there is another way to see the same object, one that becomes indispensable when the domain and codomain are spaces rather than isolated sets of numbers.

Consider $f(x) = 2x$. From the machine perspective: input $3$, output $6$. Input $-1$, output $-2$. Fine.

Now step back and look at the whole domain at once. The function $f(x) = 2x$ takes the entire real line and stretches it - every point moves to twice its distance from the origin. The number line itself is being transformed. Not just individual inputs processed; the whole space is acted on.

This is the mapping perspective. A mapping is the same mathematical object as a function, but the emphasis is different. When we say “mapping,” we are asking: what does this do to the space as a whole? How does it deform, stretch, rotate, collapse, or preserve the domain?

The language shifts:

Functions produce outputs from inputs.
Mappings transform spaces.

Both sentences describe the same object. The difference is in what you attend to.

Why this matters. When your domain is $\mathbb{R}$ and your codomain is $\mathbb{R}$, the machine perspective is natural. But when your domain is $\mathbb{R}^3$ and your codomain is $\mathbb{R}^2$, you are collapsing three-dimensional space down into two dimensions. The word “function” undersells what is happening geometrically. The word “mapping” names it.

Everywhere you encounter “map” or “mapping” in mathematics - in linear algebra, differential geometry, topology, analysis - you are being asked to think about the whole-space transformation, not just the point-by-point output.

Kernel and image. Two subsets are fundamental to every mapping $T: V \to W$.

The image is the set of all actual outputs:

$$\text{Im}(T) = \{T(v) : v \in V\}.$$

You know this as the range. In the mapping context, it tells you which part of $W$ the transformation actually reaches.

The kernel is the set of all inputs that map to zero:

$$\ker(T) = \{v \in V : T(v) = 0\}.$$

The kernel measures the information lost by the mapping. If the kernel contains only the zero vector, nothing is lost - different inputs always produce different outputs (the map is injective). If the kernel is large, the mapping is compressive: a whole subspace collapses to a single point.

Kernel and image together answer every structural question about a mapping. Does it have an inverse? Only if the kernel is trivial and the image fills the codomain. Does $T(v) = w$ have a solution? Only if $w$ is in the image.

How many solutions? If $w \notin \text{Im}(T)$, there are zero. If $w \in \text{Im}(T)$, pick any one particular solution $v_0$ satisfying $T(v_0) = w$. Then every other solution has the form $v_0 + k$ where $k \in \ker(T)$, because $T(v_0 + k) = T(v_0) + T(k) = w + 0 = w$. Conversely, if $v$ is any solution, then $T(v - v_0) = T(v) - T(v_0) = w - w = 0$, so $v - v_0 \in \ker(T)$.

The complete solution set is therefore $\{v_0 + k : k \in \ker(T)\}$ - a shifted copy of the kernel. This gives:

Kernel = $\{0\}$: exactly one solution.
Kernel is a line through the origin: infinitely many solutions lying on a line through $v_0$.
Kernel is a plane: solutions form a plane through $v_0$.

This is not just abstract. When you solve $A\mathbf{x} = \mathbf{b}$ in linear algebra, the complete solution is always written as (particular solution) + (homogeneous solution). The “homogeneous solution” is exactly the kernel.

Discomfort check. If functions and mappings are the same thing, why have two words? Because language shapes what you notice. When you say “the function $f$ evaluated at $x$,” you are thinking about a single value. When you say “the mapping $T$ acts on $V$,” you are thinking about the whole space being moved. A physicist writing $T: \mathbb{R}^3 \to \mathbb{R}^3$ for a rotation thinks about space being turned - calling it a “function” would feel wrong. A statistician writing $f: \mathbb{R}^p \to \mathbb{R}$ for a prediction model thinks about what value gets produced - calling it a “mapping” would feel unusual. Mathematics uses both words because both perspectives are useful. Learning to switch between them is part of mathematical fluency.

Section 2: Structure-Preserving Maps

Here is the situation. You have two sets with structure. You define a function between them that assigns outputs to inputs. You have satisfied the definition of a function.

But you have said nothing about whether the function respects the relationships inside those sets.

An example. The real numbers under addition form a structure. The fact $2 + 3 = 5$ is a relationship among elements. Consider two functions from $\mathbb{R}$ to $\mathbb{R}$:

$f(x) = 2x$ and $g(x) = x^2$.

Does $f$ preserve addition?

$$f(2 + 3) = f(5) = 10 \qquad \text{and} \qquad f(2) + f(3) = 4 + 6 = 10.$$

Same result. Mapping then adding equals adding then mapping. $f$ preserves the additive structure.

Does $g$ preserve addition?

$$g(2 + 3) = g(5) = 25 \qquad \text{but} \qquad g(2) + g(3) = 4 + 9 = 13.$$

Different results. $g$ is a perfectly valid function, but it destroys additive relationships between inputs.

A function that preserves the operations of a structure is called a homomorphism (Greek for “same structure”). The general pattern: if the sets carry an operation $\star$, a homomorphism satisfies

$$T(a \star b) = T(a) \star T(b).$$

For addition: $T(a + b) = T(a) + T(b)$. For scalar multiplication: $T(ca) = cT(a)$. For both at once (as we will need for vector spaces): $T(au + bv) = aT(u) + bT(v)$.

You have already seen a homomorphism without the name. The natural logarithm satisfies $\ln(xy) = \ln x + \ln y$: it turns multiplication in $\mathbb{R}^+$ into addition in $\mathbb{R}$. This identity is not an algebraic coincidence or a useful trick. It is the statement that $\ln$ is a homomorphism from $(\mathbb{R}^+, \times)$ to $(\mathbb{R}, +)$. The whole reason logarithms simplify computation is that they are structure-preserving maps between two different algebraic worlds.

Why homomorphisms matter. An arbitrary function between two structured sets carries no information about how the structures relate. A homomorphism carries all of it. It says: operations in the source correspond exactly to operations in the target. You can move the function inside or outside the operation freely.

In physics, a system is called linear when the response to a sum of inputs equals the sum of individual responses. That is exactly the homomorphism condition $T(a + b) = T(a) + T(b)$. The entire theory of linear circuits, linear mechanics, and linear wave optics rests on this: the superposition principle is the statement that the relevant mappings are homomorphisms of addition.

Isomorphisms. When a homomorphism is also a bijection, it is an isomorphism.

An isomorphism says: these two structures are the same. Not similar. Not analogous. Structurally identical. There is a perfect, reversible, structure-preserving correspondence. Every structural question about one has an identical answer for the other.

Cantor’s result that $|\mathbb{N}| = |\mathbb{Z}|$ is an isomorphism of sets (where the only structure is cardinality, and the only morphisms are bijections). When two physical systems are governed by identical equations - an LC circuit and a mass-spring system, for instance - they are isomorphic as dynamical systems: every result computed for one translates immediately to the other.

Discomfort check. “Homomorphism of what?” is always the right question. Different branches of mathematics care about different structures. Algebra cares about operations like addition and multiplication. Topology cares about continuity and open sets. Geometry cares about distances and angles. A rotation is an isometry (distance-preserving isomorphism) but not a ring homomorphism. A linear map (which we are about to study) is a homomorphism of vector spaces, preserving both addition and scalar multiplication. When you see the word homomorphism or isomorphism, name the structure immediately. The word alone is only half the information.

Section 3: Linear Maps

You have two vector spaces $V$ and $W$. You want a mapping $T: V \to W$ that is compatible with the vector space structure - a homomorphism of vector spaces.

Vector spaces have two operations: vector addition and scalar multiplication. A compatible map must preserve both:

$$T(u + v) = T(u) + T(v) \qquad \text{(additivity)}$$ $$T(cu) = c T(u) \qquad \text{(homogeneity)}$$

A mapping satisfying both is a linear map or linear transformation. The two conditions combine into one:

$$T(au + bv) = a T(u) + b T(v)$$

for all vectors $u, v \in V$ and all scalars $a, b \in \mathbb{R}$.

The forced fixed point. One immediate consequence: $T(0) = 0$. Proof: $T(0) = T(0 \cdot u) = 0 \cdot T(u) = 0$. The zero vector always maps to zero. Linear maps cannot translate - they cannot shift the origin.

This is a real constraint. The map $(x, y) \mapsto (x + 3, y + 5)$ is not linear. It moves the origin. It is an affine map: a linear part plus a constant shift. Affine maps are everywhere in practice - every neural network layer is $W\mathbf{x} + \mathbf{b}$, not $W\mathbf{x}$ - but the $\mathbf{b}$ term pushes them outside the strictly linear world.

What linearity looks like geometrically. A linear map $T: \mathbb{R}^2 \to \mathbb{R}^2$ has these geometric properties: straight lines map to straight lines (or collapse to a point); parallel lines stay parallel; the origin is fixed; the image of a grid is a (possibly distorted) grid of parallelograms, never curved shapes.

Transformations that are linear: rotation about the origin, reflection across any line through the origin, scaling (uniform or per-axis), projection onto a subspace, shearing.

Transformations that are not: any translation, any map that curves lines, any map that moves the origin.

The superposition principle. The additivity condition $T(u + v) = T(u) + T(v)$ is exactly what physicists and engineers call superposition: the response to a combination of inputs equals the combination of individual responses. You can decompose any complex input into simple components, analyze each independently, and reassemble. Linear circuits, linear mechanics, linear optics, signal processing by Fourier decomposition - all of these rest on linearity as their foundational assumption. They are tractable because of it.

Kernel and image for linear maps. For a linear map, these are not just subsets - they are subspaces.

$\ker(T)$ is a subspace of $V$. If $T(u) = 0$ and $T(v) = 0$, then $T(u + v) = T(u) + T(v) = 0$ and $T(cu) = cT(u) = 0$. The kernel is closed under addition and scaling - which is the definition of a subspace.

$\text{Im}(T)$ is a subspace of $W$. Sums of outputs are outputs of sums; scalings of outputs are outputs of scaled inputs.

The rank-nullity theorem relates the two:

$$\dim(\ker T) + \dim(\text{Im} T) = \dim(V).$$

Dimension destroyed (nullity) plus dimension surviving (rank) equals dimension you started with. Linearity conserves total dimension: it redistributes it between what is lost to the kernel and what survives into the image, but cannot create or destroy any.

Discomfort check. Why study maps with this strict zero-fixing constraint when most real-world transformations involve shifts? Because linear maps are the case where everything is tractable: determined by finite data, representable as matrices, composable by multiplication. Real-world problems are regularly linearized - approximated locally by linear maps - to exploit this tractability. The affine case is handled by a concrete trick: homogeneous coordinates. To make $T(\mathbf{x}) = A\mathbf{x} + \mathbf{b}$ linear, append a $1$ to every vector and work in $\mathbb{R}^{n+1}$:

$$\begin{pmatrix} A & \mathbf{b} \\ \mathbf{0}^T & 1 \end{pmatrix} \begin{pmatrix} \mathbf{x} \\ 1 \end{pmatrix} = \begin{pmatrix} A\mathbf{x} + \mathbf{b} \\ 1 \end{pmatrix}$$

The augmented $(n+1) \times (n+1)$ matrix acts linearly on $\mathbb{R}^{n+1}$. The affine action on $\mathbb{R}^n$ is recovered by reading the top $n$ entries of the output. The trailing $1$ is just a bookkeeping device. This is why every transformation in 3D graphics (translation, rotation, scaling, perspective) can be represented as a $4 \times 4$ matrix - all four operations become linear in homogeneous coordinates. The linear case is the foundational one. Understand it, and affine is a minor extension.

Section 4: Matrices Are Linear Maps

Every linear map $T: \mathbb{R}^n \to \mathbb{R}^m$ can be encoded as an $m \times n$ matrix $A$, and the action of $T$ is then exactly matrix-vector multiplication:

$$T(\mathbf{x}) = A\mathbf{x}.$$

This is not a computational shorthand. It is why matrices are defined the way they are.

Where the matrix comes from. A linear map is completely determined by where it sends a basis.

Every vector $\mathbf{x} = (x_1, \ldots, x_n)$ decomposes over the standard basis $e_1, \ldots, e_n$:

$$\mathbf{x} = x_1 e_1 + x_2 e_2 + \cdots + x_n e_n.$$

By linearity:

$$T(\mathbf{x}) = x_1 T(e_1) + x_2 T(e_2) + \cdots + x_n T(e_n).$$

The map is determined by the $n$ vectors $T(e_1), \ldots, T(e_n)$, each living in $\mathbb{R}^m$. Stack them as columns:

$$A = \begin{pmatrix} | & | & & | \\ T(e_1) & T(e_2) & \cdots & T(e_n) \\ | & | & & | \end{pmatrix}.$$

Then $A\mathbf{x}$ automatically computes $x_1 T(e_1) + \cdots + x_n T(e_n) = T(\mathbf{x})$.

The $j$-th column of $A$ is where the $j$-th basis vector goes. When you see a matrix, you are seeing a record of where the standard basis vectors land. The entire linear map is determined by this finite list.

Example: $90°$ rotation. Let $T: \mathbb{R}^2 \to \mathbb{R}^2$ rotate counterclockwise by $90°$.

$e_1 = (1, 0)$: the positive $x$-direction rotates to the positive $y$-direction. $T(e_1) = (0, 1)$.

$e_2 = (0, 1)$: the positive $y$-direction rotates to the negative $x$-direction. $T(e_2) = (-1, 0)$.

$$A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}.$$

Check on a vector: $A \begin{pmatrix} 3 \\ 1 \end{pmatrix} = \begin{pmatrix} -1 \\ 3 \end{pmatrix}$. The vector $(3, 1)$ rotated $90°$ counterclockwise becomes $(-1, 3)$. Correct.

Matrix multiplication is composition. If $S: \mathbb{R}^n \to \mathbb{R}^m$ has matrix $A$ (size $m \times n$) and $T: \mathbb{R}^m \to \mathbb{R}^p$ has matrix $B$ (size $p \times m$), then $T \circ S: \mathbb{R}^n \to \mathbb{R}^p$ has matrix $BA$ (size $p \times n$).

This is the reason matrix multiplication is defined the way it is. The row-times-column rule is not arbitrary - it is the unique rule that makes $BA\mathbf{x} = B(A\mathbf{x})$, so that applying $A$ then $B$ as maps matches multiplying $B$ times $A$ as matrices. Matrix multiplication is function composition, written in coordinates.

Reading dimensions in a chain. When you multiply a chain of matrices $B_k \cdots B_2 B_1$, the dimensions tell you exactly what is happening:

$B_1$ is $m_1 \times n$: maps $\mathbb{R}^n \to \mathbb{R}^{m_1}$ (input has $n$ features; first transformation produces $m_1$ features).
$B_2$ is $m_2 \times m_1$: maps $\mathbb{R}^{m_1} \to \mathbb{R}^{m_2}$ (takes the $m_1$ intermediate features and produces $m_2$).
$B_k$ is $p \times m_{k-1}$: final output has $p$ dimensions.

The product $B_k \cdots B_1$ is $p \times n$: it maps directly from the original $n$-dimensional input space to the final $p$-dimensional output space, collapsing all the intermediate spaces. You never see the intermediate $m_1, m_2, \ldots$ dimensions in the final matrix - they are summed over in the matrix multiply. This is why a deep neural network’s weight matrices (each one a transformation) can be composed: the output dimension of each layer equals the input dimension of the next, and the whole forward pass is one big composition.

Function properties become matrix properties. The abstract concepts translate directly:

$T$ is injective $\iff$ $\ker(T) = \{0\}$ $\iff$ $A\mathbf{x} = 0$ has only the trivial solution $\iff$ the columns of $A$ are linearly independent.

$T$ is surjective $\iff$ $\text{Im}(T) = \mathbb{R}^m$ $\iff$ $A\mathbf{x} = \mathbf{b}$ has a solution for every $\mathbf{b}$ $\iff$ the columns of $A$ span $\mathbb{R}^m$.

$T: \mathbb{R}^n \to \mathbb{R}^n$ is bijective $\iff$ $A$ is invertible $\iff$ $T^{-1}$ exists with matrix $A^{-1}$.

The question “does an inverse function exist?” became “is the matrix invertible?” The machinery changed from abstract set theory to concrete linear algebra. The underlying idea - bijection as the exact condition for invertibility - did not change at all.

Discomfort check. What about linear maps between infinite-dimensional spaces - the differentiation operator $D: f \mapsto f'$ on the space of smooth functions, or the Fourier transform? These are linear maps but they cannot be written as finite matrices. The matrix-linear-map correspondence is exact for finite-dimensional vector spaces $\mathbb{R}^n \to \mathbb{R}^m$. In infinite dimensions you need functional analysis, and the matrix is replaced by an abstract operator. The concepts - kernel, image, injectivity, surjectivity - still apply; the tools for computing with them change.

Summary

Concept	Definition	Why It Matters
Kernel $\ker(T)$	$\{v : T(v) = 0\}$	Information destroyed; $T$ injective iff $\ker(T) = \{0\}$
Image $\text{Im}(T)$	$\{T(v) : v \in V\}$	What the mapping actually reaches; $T$ surjective iff $\text{Im}(T)$ = codomain
Homomorphism	$T(a \star b) = T(a) \star T(b)$	Preserves the operations of a structure; isomorphism when also bijective
Linear map	$T(au+bv) = aT(u)+bT(v)$	Homomorphism of vector spaces; the central object of linear algebra
Matrix $A$	$m \times n$ array	Encodes a linear map $\mathbb{R}^n \to \mathbb{R}^m$; $j$-th column = $T(e_j)$

The questions worth having ready for any linear map: what is its kernel? Is the kernel trivial (injective)? Does the image fill the target space (surjective)? Can you write it as a matrix? Where does each basis vector land?

Read Next: