Vector Spaces & Subspaces - The Geometry of Abstract Addition and Scaling
Helpful context:
The city problem from last time gave us two special sets. The column space: all destinations reachable by combining the available moves. The null space: all combinations of moves that return you to the start. Both were closed under addition and scaling, both contained the origin.
But here is a question we glossed over: closed under operations of what? The column space lives inside $\mathbb{R}^m$ (the destination space). The null space lives inside $\mathbb{R}^n$ (the move space). To say “the column space is a subspace of something”, we need to first understand what that something actually is.
So before we talk about subspaces, we need to understand what a vector space is.
Where the Axioms Came From
The eight axioms below can feel like they were handed down from somewhere - precise, clean, obviously correct. But they were not invented in one go. They took about fifty years to stabilize, and they emerged not by asking “what would be a nice set of rules?” but by looking at what mathematicians were already doing and asking “what are we actually relying on?”
The starting point: geometry and physics (pre-1840s). The earliest “vectors” were geometric - arrows in 2D or 3D space representing forces, velocities, displacements. Newton’s mechanics made it clear that forces add (two pushes give a net push) and scale (double the force, double the effect). But this was always tied to physical 3D space. Nobody thought to ask what made these operations work, because the geometric picture seemed self-evident.
The first crack: Fourier (1822). Fourier showed that any periodic function can be expressed as a sum of sines and cosines: $f(x) = a_0 + a_1 \cos x + b_1 \sin x + a_2 \cos 2x + \cdots$. This was more than a calculation trick. It meant that functions could be “added” and “scaled” in exactly the same way as arrows - and that a function could be described by specifying its “coordinates” $(a_0, a_1, b_1, a_2, \ldots)$ in the sine-cosine “basis.” Functions and arrows were obeying the same rules. Nobody had the language to say this precisely yet, but the observation was planted.
Grassmann’s abstraction (1844). Hermann Grassmann, a German schoolteacher, published a book that today looks remarkably modern: he defined an abstract $n$-dimensional linear space, freed from any geometric picture, and gave axioms for what it meant to add and scale elements. He saw that the same formal structure covered both geometric vectors and algebraic objects. His book was largely ignored for forty years - it was too abstract, too far ahead of what people thought was needed, and written in a philosophical style that made it hard to read. But the idea was there.
Peano’s clean formulation (1888). Giuseppe Peano - better known for the axioms of arithmetic - wrote down the first fully rigorous axiomatic definition of a vector space in essentially the form we use today. He listed the eight rules, recognized that many different mathematical objects satisfied them, and defined the abstract concept independent of any specific example. Peano was explicit about what Grassmann had intuited: the axioms are not a description of arrows in space, they are conditions on any set with two operations. Anything satisfying the conditions gets all the theorems for free.
Why exactly these eight rules? This is the right question to ask. The axioms were not chosen aesthetically. They were distilled from decades of mathematical practice by asking: what did we actually use? Every proof about vectors in $\mathbb{R}^2$ or solutions to differential equations or Fourier series implicitly used these eight properties and nothing else. When Peano wrote them down, he was not inventing rules - he was making explicit the rules that had always been in play. The claim “these axioms are exactly what you need” was validated retrospectively: everything in linear algebra - span, basis, dimension, linear maps, rank-nullity - follows from just these eight rules, and removing any one of them breaks something.
What changed over time. The first iterations included more axioms than necessary. Early treatments sometimes listed commutativity of vector addition ($\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$) as a separate assumption alongside the group axioms, not realizing it was independent of the others. Some included finite-dimensionality as an axiom rather than a property a space might or might not have. The modern definition also replaced “$\mathbb{R}$-scalars” with “scalars from an arbitrary field” - which allowed complex vector spaces, spaces over finite fields (useful in coding theory), and so on. This generalization happened gradually through the 1900s as abstract algebra and its applications expanded.
The shape of the definition today - eight rules, scalars from a field, no mention of dimension or geometry - is the result of mathematicians repeatedly asking “do we actually need this assumption?” and removing it whenever the answer was no.
What $\mathbb{R}^n$ Actually Is
When we solved the city problem, we used two operations on vectors: addition and scalar multiplication. We didn’t pause to ask what rules these operations follow. But they do follow rules, and we used every one of them without noticing.
Here is what we relied on:
About addition:
- $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$: the order you combine moves doesn’t change the result.
- $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$: how you group three moves doesn’t matter.
- There is a zero vector $\mathbf{0}$ such that $\mathbf{v} + \mathbf{0} = \mathbf{v}$: the empty move exists.
- Every vector $\mathbf{v}$ has a negative $-\mathbf{v}$ such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$: every move can be reversed.
About scaling:
- $1 \cdot \mathbf{v} = \mathbf{v}$: scaling by 1 does nothing.
- $(\alpha\beta)\mathbf{v} = \alpha(\beta\mathbf{v})$: scaling twice is the same as scaling by the product.
- $\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}$: scaling distributes over vector addition.
- $(\alpha + \beta)\mathbf{v} = \alpha\mathbf{v} + \beta\mathbf{v}$: scaling distributes over scalar addition.
These eight rules together are the definition of a vector space. $\mathbb{R}^n$ satisfies all eight. So does $\mathbb{R}^m$, for any $m$.
The reason to isolate these eight rules is that other, stranger things also satisfy them - and once you know they do, everything we built in the city problem applies to those things too.
The Same Rules, Different Objects
Forget cities. Consider all polynomials of degree at most 2:
$$p(x) = a + bx + cx^2, \quad a, b, c \in \mathbb{R}$$
Call this set $\mathcal{P}_2$. Does it satisfy the eight rules?
Addition: $(a + bx + cx^2) + (a' + b’x + c’x^2) = (a+a') + (b+b')x + (c+c')x^2$. Still degree at most 2. You can check: addition is commutative, associative, the zero polynomial $0 + 0x + 0x^2$ acts as identity, and $(-a) + (-b)x + (-c)x^2$ is the negative.
Scaling: $\lambda(a + bx + cx^2) = \lambda a + \lambda bx + \lambda cx^2$. Still degree at most 2. The scaling rules all follow from how multiplication distributes.
All eight rules hold. $\mathcal{P}_2$ is a vector space.
The elements look nothing like arrows or coordinate tuples. But the structure is identical. This is the point: a vector space is not a thing, it is a structure. Any set with addition and scalar multiplication satisfying the eight rules qualifies, regardless of what the “vectors” actually are.
Other examples, once you check the rules:
- All $m \times n$ matrices (add entrywise, scale entrywise): a vector space.
- All continuous functions on $[0, 1]$: a vector space.
- All solutions to the differential equation $f'' + f = 0$: a vector space. (The sum of two solutions is a solution; scaling a solution gives a solution.)
That last one is striking. A quick note on the terms: a differential equation is an equation that relates a function to its derivatives - $f'' + f = 0$ says “the second derivative of $f$ equals $-f$ at every point.” A smooth function is one that is infinitely differentiable: you can take its derivative as many times as you like and always get another well-defined function. The solutions to $f'' + f = 0$ are smooth: $f(x) = \sin x$ and $f(x) = \cos x$ (and all their linear combinations), all of which are differentiable any number of times. The solution set of a differential equation has vector space structure. This is not a coincidence: linear algebra is fundamental to differential equations precisely because linear structure is everywhere.
Which Subsets Inherit the Structure?
Now the real question: $\mathbb{R}^n$ is a vector space. The column space is a subset of $\mathbb{R}^m$. Is the column space itself a vector space?
For a subset $U \subseteq V$ to be a vector space under the same operations, it does not need to re-verify all eight rules from scratch. Five of the eight rules are equations about elements - commutativity, associativity, distributivity. Any element of $U$ is also an element of $V$, so those equations hold in $U$ for free. The ambient space $V$ already guarantees them.
What remains to check is whether $U$ is actually closed under the operations at all - that operating on elements of $U$ keeps you inside $U$. Specifically:
- $\mathbf{0} \in U$: without the zero vector, $U$ cannot be a vector space.
- $\mathbf{u}, \mathbf{v} \in U \Rightarrow \mathbf{u} + \mathbf{v} \in U$: closed under addition.
- $\mathbf{u} \in U, \alpha \in \mathbb{R} \Rightarrow \alpha\mathbf{u} \in U$: closed under scaling.
A subset satisfying these three conditions is called a subspace of $V$. A subspace is a subset that is itself a vector space.
Why only three conditions? The other five axioms (commutativity, associativity, etc.) are inherited from the ambient space. They cost nothing to check. The three conditions are exactly what is not automatic: does $U$ stay inside itself when you operate?
Which Sets Are Subspaces?
The three-condition test is fast. Start with the zero vector check, since failing it costs nothing to detect.
A line through the origin in $\mathbb{R}^2$. $U = \{(x, y) : y = 2x\}$. Zero: $(0,0)$ satisfies $0 = 2 \cdot 0$. In. Sum: $(a, 2a) + (b, 2b) = (a+b, 2(a+b))$. Still on the line. Scale: $\lambda(a, 2a) = (\lambda a, 2\lambda a)$. Still on the line. Subspace.
A line not through the origin. $U = \{(x,y) : y = 2x + 1\}$. Zero check: $0 = 2(0) + 1$? No. Not a subspace. Done in one step.
A plane through the origin in $\mathbb{R}^3$. $U = \{(x,y,z) : ax + by + cz = 0\}$. Zero: $0 = 0$. Yes. Sum: if $a x_1 + b y_1 + c z_1 = 0$ and $a x_2 + b y_2 + c z_2 = 0$, then $a(x_1+x_2) + b(y_1+y_2) + c(z_1+z_2) = 0$. Yes. Scale: $a(\lambda x) + b(\lambda y) + c(\lambda z) = \lambda(ax + by + cz) = 0$. Yes. Subspace.
The unit disk. $\{(x,y) : x^2 + y^2 \leq 1\}$. Zero: $(0,0)$ is inside. Scaling: take $(1, 0)$, scale by $3$: $(3,0)$, which has $9 + 0 = 9 > 1$. Outside. Not a subspace.
Just the origin. $\{(0,0)\}$. All three conditions hold trivially. Subspace. (The smallest one.)
All of $\mathbb{R}^2$. Everything is in there. All three conditions hold trivially. Subspace. (The largest one.)
Pattern for $\mathbb{R}^2$: the only subspaces are $\{\mathbf{0}\}$, lines through the origin, and $\mathbb{R}^2$ itself. For $\mathbb{R}^3$: $\{\mathbf{0}\}$, lines through the origin, planes through the origin, and $\mathbb{R}^3$. The origin condition is what distinguishes subspaces from arbitrary subsets - you cannot have a subspace that avoids the origin, because scaling by zero forces you back to it.
Building Subspaces: Span
Given some vectors, what is the smallest subspace containing all of them?
If $U$ is a subspace containing $\mathbf{v_1}$ and $\mathbf{v_2}$, then:
- By scaling, it must contain $c_1 \mathbf{v_1}$ and $c_2 \mathbf{v_2}$ for all $c_1, c_2 \in \mathbb{R}$.
- By addition, it must contain $c_1 \mathbf{v_1} + c_2 \mathbf{v_2}$ for all $c_1, c_2 \in \mathbb{R}$.
That set of all linear combinations is itself a subspace (check: contains zero with $c_1 = c_2 = 0$; closed under addition and scaling by combining coefficients). And it is the smallest subspace containing $\mathbf{v_1}$ and $\mathbf{v_2}$: any subspace containing them must contain all their combinations.
We call this the span:
$$\text{span}(\mathbf{v_1}, \ldots, \mathbf{v_k}) = \{c_1 \mathbf{v_1} + \cdots + c_k \mathbf{v_k} : c_1, \ldots, c_k \in \mathbb{R}\}$$
The span of one nonzero vector is a line through the origin. The span of two vectors that are not multiples of each other is a plane through the origin. The span of three vectors in $\mathbb{R}^3$ that are not all coplanar is all of $\mathbb{R}^3$.
Going back to the city problem: the column space of $A$ is exactly $\text{span}(\mathbf{a_1}, \ldots, \mathbf{a_n})$ - the span of $A$’s columns. The columns are your available moves; the column space is everywhere you can reach.
When Are Seeds Redundant? Linear Independence
You want to describe a subspace by giving a generating set - a set of vectors whose span is exactly the subspace. Some generating sets are wasteful: if one of your vectors is already a combination of the others, removing it does not change the span.
The question: when is a set of vectors free of redundancy?
The vectors $\mathbf{v_1}, \ldots, \mathbf{v_k}$ are linearly independent if the only solution to
$$c_1 \mathbf{v_1} + c_2 \mathbf{v_2} + \cdots + c_k \mathbf{v_k} = \mathbf{0}$$
is $c_1 = c_2 = \cdots = c_k = 0$.
Why does this capture redundancy? If some $\mathbf{v_j}$ is a combination of the others - say $\mathbf{v_j} = \sum_{i \neq j} \lambda_i \mathbf{v_i}$ - then rearranging gives a nontrivial combination equal to zero (with coefficient $1$ on $\mathbf{v_j}$). Conversely, any nontrivial combination equaling zero lets you solve for one vector in terms of the others. The zero-combination condition is exactly the algebraic signature of non-redundancy.
Example. Are $(1,0)$, $(0,1)$, $(2,3)$ linearly independent in $\mathbb{R}^2$?
$c_1(1,0) + c_2(0,1) + c_3(2,3) = (0,0)$ gives the system $c_1 + 2c_3 = 0$ and $c_2 + 3c_3 = 0$. Setting $c_3 = 1$: $c_1 = -2$, $c_2 = -3$. A nontrivial solution exists: $(2,3) = 2(1,0) + 3(0,1)$. Linearly dependent.
Example. Are $(1,0)$ and $(0,1)$ linearly independent?
$c_1(1,0) + c_2(0,1) = (0,0)$ forces $c_1 = 0$ and $c_2 = 0$. Only the trivial solution. Linearly independent.
Basis: The Exact Right Amount
A basis of a subspace $U$ is a list of vectors that is simultaneously:
- Spanning: every vector in $U$ is a linear combination of the list.
- Linearly independent: no vector in the list is a combination of the others.
Spanning says “enough”; independence says “not too much.” A basis is the exact right amount.
Examples of bases for $\mathbb{R}^2$: $\{(1,0), (0,1)\}$. Also $\{(1,1), (1,-1)\}$. Also $\{(3,1), (1,2)\}$. All are valid. There are infinitely many choices of basis for the same space.
Here is the key question: must every basis for a given space have the same number of elements?
It is not obvious that the answer is yes. One basis might look larger or smaller than another. But the answer is yes, and we should prove it - because dimension being a well-defined number depends entirely on this fact.
Why Every Basis Has the Same Size
The proof hinges on one lemma:
Lemma (Exchange). If $\mathbf{w_1}, \ldots, \mathbf{w_n}$ spans $V$ and $\mathbf{u_1}, \ldots, \mathbf{u_m}$ is linearly independent in $V$, then $m \leq n$.
In other words: a linearly independent set can never be larger than a spanning set.
Proof. Think of the spanning set as a team of $n$ workers $\{\mathbf{w_1}, \ldots, \mathbf{w_n}\}$ who together can produce any vector in $V$. We will replace them, one by one, with the independent vectors $\mathbf{u_1}, \ldots, \mathbf{u_m}$, showing that each exchange is possible and that we need at least $m$ workers to start with.
Round 1. Add $\mathbf{u_1}$ to the team: the set $\{\mathbf{u_1}, \mathbf{w_1}, \ldots, \mathbf{w_n}\}$ still spans (adding a vector to a spanning set keeps it spanning). But now the team has $n+1$ members and is linearly dependent. Since $\mathbf{w_1}, \ldots, \mathbf{w_n}$ already spans, $\mathbf{u_1}$ is a combination of them:
$$\mathbf{u_1} = d_1 \mathbf{w_1} + \cdots + d_n \mathbf{w_n}$$
At least one coefficient $d_j$ is nonzero - if all were zero, $\mathbf{u_1} = \mathbf{0}$, which cannot be in a linearly independent set. Say $d_1 \neq 0$ (relabeling if needed). Solve for $\mathbf{w_1}$:
$$\mathbf{w_1} = \frac{1}{d_1}(\mathbf{u_1} - d_2\mathbf{w_2} - \cdots - d_n\mathbf{w_n})$$
So $\mathbf{w_1}$ is now expressible using $\mathbf{u_1}$ and the remaining $\mathbf{w_j}$’s - it is redundant. Remove it. The new team $\{\mathbf{u_1}, \mathbf{w_2}, \ldots, \mathbf{w_n}\}$ still spans $V$, and has $n$ members.
Round 2. Add $\mathbf{u_2}$: $\{\mathbf{u_1}, \mathbf{u_2}, \mathbf{w_2}, \ldots, \mathbf{w_n}\}$ spans and is dependent. Write $\mathbf{u_2}$ in terms of this team:
$$\mathbf{u_2} = c_1 \mathbf{u_1} + d_2 \mathbf{w_2} + \cdots + d_n \mathbf{w_n}$$
Some $d_j$ must be nonzero. Why? If every $d_j = 0$, then $\mathbf{u_2} = c_1 \mathbf{u_1}$, meaning $\mathbf{u_1}$ and $\mathbf{u_2}$ are proportional - contradicting the assumption that $\mathbf{u_1}, \ldots, \mathbf{u_m}$ is linearly independent. So some $\mathbf{w_j}$ has nonzero coefficient and can be solved for and removed. The team still spans and still has $n$ members.
The pattern. At round $k$: we add $\mathbf{u_k}$ to the current team. Writing $\mathbf{u_k}$ in terms of the current team, some $\mathbf{w_j}$ must appear with nonzero coefficient - because if only the $\mathbf{u_1}, \ldots, \mathbf{u_{k-1}}$ terms appeared, $\mathbf{u_k}$ would be a combination of earlier $\mathbf{u}$’s, violating independence. So we can always remove a $\mathbf{w_j}$.
After $m$ rounds, we have placed all of $\mathbf{u_1}, \ldots, \mathbf{u_m}$ into the team and fired $m$ workers. We can only fire workers that exist, so we need $n \geq m$. $\square$
Corollary: Any two bases have the same size.
Let $B_1$ and $B_2$ both be bases. $B_1$ is linearly independent; $B_2$ spans. By the lemma: $|B_1| \leq |B_2|$. Flip the roles: $B_2$ is linearly independent; $B_1$ spans. So $|B_2| \leq |B_1|$. Therefore $|B_1| = |B_2|$.
Dimension
Since every basis has the same size, that size is a property of the space itself, not of any particular basis. We call it the dimension:
$$\dim(V) = \text{size of any basis for } V$$
- $\dim(\mathbb{R}^n) = n$. The standard basis $\{\mathbf{e_1}, \ldots, \mathbf{e_n}\}$ has $n$ elements.
- A line through the origin in $\mathbb{R}^3$ has dimension 1.
- A plane through the origin in $\mathbb{R}^3$ has dimension 2.
- $\dim(\mathcal{P}_n) = n+1$. A basis is $\{1, x, x^2, \ldots, x^n\}$.
- The solution space of $f'' + f = 0$ has dimension 2. A basis is $\{\sin x, \cos x\}$.
- $\dim(\{\mathbf{0}\}) = 0$. The empty list is the basis: vacuously independent, and spans $\{\mathbf{0}\}$ since the empty combination is $\mathbf{0}$.
Dimension measures the intrinsic “degrees of freedom” of a space - how many independent choices you need to specify a vector.
The dimension of the column space of $A$ is the rank of $A$: the number of truly independent columns. The dimension of the null space is the nullity. And from last time: $\text{rank}(A) + \text{nullity}(A) = n$ (the number of columns). That is rank-nullity, and now you can see it as a statement about dimensions of subspaces.
Basis as a Coordinate System
Once you choose a basis $\{\mathbf{b_1}, \ldots, \mathbf{b_k}\}$ for a $k$-dimensional subspace $U$, every vector $\mathbf{v} \in U$ has a unique representation:
$$\mathbf{v} = c_1 \mathbf{b_1} + c_2 \mathbf{b_2} + \cdots + c_k \mathbf{b_k}$$
Existence follows from the spanning property; uniqueness from independence (if two representations existed, their difference would give a nontrivial zero combination). The numbers $(c_1, \ldots, c_k)$ are the coordinates of $\mathbf{v}$ in this basis.
Different bases give different coordinate systems for the same vector. The vector itself does not change - only its description changes. This is the starting point for change of basis, which we will cover later.
In signal processing: the Fourier basis expresses any periodic signal as a combination of sine and cosine functions. The coordinates are the amplitudes of each frequency. Same signal, different coordinate system, completely different representation.
In data compression: find a basis adapted to your data so that most coordinates are small. Dropping small coordinates compresses the signal with minimal loss. This is the idea behind PCA and JPEG compression.
Abstract Vector Spaces, One More Time
We developed everything above for subspaces of $\mathbb{R}^n$. But vector spaces over $\mathbb{R}$ go further. Every finite-dimensional vector space $V$ with $\dim(V) = n$ is isomorphic to $\mathbb{R}^n$ - there is a bijection preserving addition and scaling. So in a deep sense, you already understand all finite-dimensional vector spaces: they are all secretly $\mathbb{R}^n$ in disguise, just with different labels on the vectors.
What the abstract framework buys you is two things. First: you can apply the same tools directly to polynomial spaces, matrix spaces, and function spaces without re-deriving everything. Second: the abstract viewpoint lets you see the same structure in places you would not otherwise connect.
The solution set of a homogeneous linear ODE $L[y] = 0$ is a subspace of the vector space of all smooth functions. (Homogeneous here means the right-hand side is zero - the equation has no “forcing term.” For example, $y'' + 3y' + 2y = 0$ is homogeneous; $y'' + 3y' + 2y = e^{-t}$ is not. The homogeneous case is where the vector space structure appears, because zero is a solution and linear combinations of solutions are solutions.) Its dimension equals the order of the ODE. A basis for the solution space is a fundamental system of solutions. This is not an analogy with linear algebra - it is linear algebra, applied to function spaces.
The Rigorous Underpinning
Everything above was proved or precisely argued. Here is where the pieces fit together formally.
Subspace test. $U \subseteq V$ is a subspace iff: $\mathbf{0} \in U$, $U$ is closed under addition, $U$ is closed under scalar multiplication. The other five vector space axioms are inherited from $V$ automatically.
Span is always a subspace. $\text{span}(\mathbf{v_1}, \ldots, \mathbf{v_k})$ satisfies all three conditions (zero comes from setting all $c_i = 0$; closure follows from combining coefficients). It is the smallest subspace containing the given vectors.
Basis existence. Every finite-dimensional vector space has a basis. One way to construct one: start from any spanning set and repeatedly remove redundant vectors. You stop when independent - and you always reach independence before emptying the set.
Dimension is well-defined. Proved above: the exchange lemma implies any two bases have the same size.
Every finite-dimensional vector space is isomorphic to $\mathbb{R}^n$. Given any basis $\{b_1, \ldots, b_n\}$ of $V$, the coordinate map $v \mapsto (c_1, \ldots, c_n)$ is a linear isomorphism from $V$ to $\mathbb{R}^n$. It preserves all linear-algebraic structure.
Summary
| Concept | What it asks | What it tells you |
|---|---|---|
| Vector space | Does this set with addition and scaling satisfy the 8 axioms? | Whether the structure supports all of linear algebra |
| Subspace | Is this subset itself a vector space (under the same ops)? | Whether you can stay inside the set freely |
| Span | What is the smallest subspace containing these vectors? | How much space the vectors collectively cover |
| Linear independence | Is any vector redundant? | Whether your description of a space is minimal |
| Basis | Spanning + independent simultaneously | The exact right number of generators |
| Dimension | How many vectors in any basis? | The intrinsic degrees of freedom of the space |
The payoff of the abstraction: these concepts - defined once for abstract vector spaces - apply directly to $\mathbb{R}^n$, polynomial spaces, matrix spaces, function spaces, and solution spaces of differential equations. The same tools, used everywhere.
Read next: