Vector Spaces & Subspaces
Prerequisite:
The Abstract Definition
A vector space over a field $\mathbb{F}$ (take $\mathbb{F} = \mathbb{R}$ or $\mathbb{C}$) is a set $V$ equipped with two operations - addition $+: V \times V \to V$ and scalar multiplication $\cdot: \mathbb{F} \times V \to V$ - satisfying all eight axioms:
- Commutativity of addition: $u + v = v + u$ for all $u, v \in V$
- Associativity of addition: $(u + v) + w = u + (v + w)$
- Additive identity: there exists $\mathbf{0} \in V$ with $v + \mathbf{0} = v$ for all $v$
- Additive inverse: for every $v \in V$ there exists $-v \in V$ with $v + (-v) = \mathbf{0}$
- Multiplicative identity: $1 \cdot v = v$ for all $v$
- Associativity of scalar multiplication: $(\alpha \beta) v = \alpha (\beta v)$
- Distributivity over vector addition: $\alpha(u + v) = \alpha u + \alpha v$
- Distributivity over scalar addition: $(\alpha + \beta)v = \alpha v + \beta v$
The axioms are not redundant - each is necessary. Note: the zero vector is unique (if $\mathbf{0}'$ also works, then $\mathbf{0} = \mathbf{0} + \mathbf{0}' = \mathbf{0}'$), and additive inverses are unique by a similar argument.
Examples.
- $\mathbb{R}^n$ with componentwise operations (the prototype).
- $\mathbb{F}^{m \times n}$: $m \times n$ matrices with real entries.
- $\mathcal{P}_n(\mathbb{R})$: polynomials of degree at most $n$, with pointwise addition and scalar multiplication.
- $C([a,b])$: continuous functions on $[a,b]$ with pointwise operations.
- ${0}$: the trivial vector space.
Subspaces
Definition. A nonempty subset $U \subseteq V$ is a subspace of $V$ if $U$ is itself a vector space under the same operations inherited from $V$.
Theorem (Subspace Test). A nonempty $U \subseteq V$ is a subspace if and only if it satisfies all three conditions:
- $\mathbf{0} \in U$
- $u, v \in U \Rightarrow u + v \in U$ (closed under addition)
- $\alpha \in \mathbb{F},\ u \in U \Rightarrow \alpha u \in U$ (closed under scalar multiplication)
Why only three? The other five axioms are inherited from $V$ for free - they are equations that hold for all elements of $V$, hence for all elements of $U \subseteq V$.
Why $\mathbf{0}$? Condition 3 with $\alpha = 0$ gives $0 \cdot u = \mathbf{0} \in U$, so condition 1 is actually implied by conditions 2 and 3 (provided $U$ is nonempty). Requiring $\mathbf{0}$ explicitly is cleaner and sufficient to verify nonemptiness.
Subspace U as a plane through the origin in R^3:
z
|
| * * *
| * U *
| * *
|* *
--------+-----------> y
/|
/ |
x
U must pass through the origin (0,0,0).
Lines and planes NOT through origin are NOT subspaces.
Non-example. The set ${(x,y) \in \mathbb{R}^2 : x + y = 1}$ is not a subspace: it does not contain $\mathbf{0}$.
Linear Combinations and Span
A linear combination of vectors $v_1, \ldots, v_k \in V$ is any vector of the form $\sum_{i=1}^k \alpha_i v_i$ where $\alpha_i \in \mathbb{F}$.
Definition. The span of $S = {v_1, \ldots, v_k}$ is:
$$\text{span}(S) = \left{ \sum_{i=1}^k \alpha_i v_i : \alpha_i \in \mathbb{F} \right}$$
Theorem. $\text{span}(S)$ is a subspace of $V$ - in fact, the smallest subspace containing $S$.
Proof. It contains $\mathbf{0}$ (take all $\alpha_i = 0$). It is closed under addition and scalar multiplication by linearity of the sum. Any subspace containing $S$ must contain all linear combinations of $S$, hence contains $\text{span}(S)$.
Linear Independence
Definition. A list $v_1, \ldots, v_k \in V$ is linearly independent if the only solution to $\sum_{i=1}^k \alpha_i v_i = \mathbf{0}$ is $\alpha_1 = \cdots = \alpha_k = 0$. Otherwise the list is linearly dependent.
Linear dependence has a concrete interpretation: a list is linearly dependent iff some vector in it lies in the span of the others. Formally: if $\sum \alpha_i v_i = \mathbf{0}$ with $\alpha_j \neq 0$, then $v_j = -\alpha_j^{-1} \sum_{i \neq j} \alpha_i v_i \in \text{span}({v_i}_{i \neq j})$.
Test in $\mathbb{R}^n$. Arrange $v_1, \ldots, v_k$ as columns of a matrix $A \in \mathbb{R}^{n \times k}$. The list is linearly independent iff $\text{null}(A) = {\mathbf{0}}$, iff $\text{rank}(A) = k$, iff the system $A\boldsymbol{\alpha} = \mathbf{0}$ has only the trivial solution.
Basis and Dimension
Definition. A basis of $V$ is a list of vectors that is both linearly independent and spans $V$.
Theorem (Existence of Basis). Every spanning list in $V$ contains a basis. Every linearly independent list extends to a basis (in finite-dimensional spaces).
Proof sketch (spanning list contains a basis). If $v_1, \ldots, v_k$ spans $V$ and is dependent, some $v_j \in \text{span}({v_i}_{i \neq j})$; remove it. The remaining list still spans. Repeat until independent.
Theorem (Uniqueness of Representation). If $v_1, \ldots, v_n$ is a basis of $V$, then every $v \in V$ has a unique representation $v = \sum_{i=1}^n \alpha_i v_i$.
Proof. Existence: by spanning. Uniqueness: if $\sum \alpha_i v_i = \sum \beta_i v_i$, then $\sum (\alpha_i - \beta_i) v_i = \mathbf{0}$; independence forces $\alpha_i = \beta_i$ for all $i$.
Theorem (Dimension is Well-Defined). Any two bases of a finite-dimensional $V$ have the same number of elements. This common number is $\dim(V)$.
Proof. If $B_1$ has $m$ elements and $B_2$ has $n$ elements and both are bases, one can show $m \leq n$ (because any linearly independent list has length $\leq$ any spanning list) and by symmetry $n \leq m$.
Standard dimensions: $\dim(\mathbb{R}^n) = n$, $\dim(\mathcal{P}_n) = n+1$, $\dim(\mathbb{F}^{m \times n}) = mn$.
Infinite-Dimensional Spaces
Not all vector spaces are finite-dimensional. $C([0,1])$ (continuous functions) is infinite-dimensional: the list $1, x, x^2, x^3, \ldots$ is linearly independent (by properties of polynomials), and no finite list spans $C([0,1])$.
$\mathcal{P}(\mathbb{R})$, polynomials of arbitrary degree, is infinite-dimensional with countable basis ${1, x, x^2, \ldots}$. Function spaces like $L^2([0,1])$ admit orthonormal bases (Fourier series), but these require the machinery of Hilbert spaces - limits of infinite sums converge only in an analytic sense.
Direct Sums
Definition. $V$ is the direct sum $U_1 \oplus U_2$ of subspaces $U_1, U_2$ if every $v \in V$ can be written uniquely as $v = u_1 + u_2$ with $u_i \in U_i$.
Theorem. $V = U_1 \oplus U_2$ if and only if $V = U_1 + U_2$ and $U_1 \cap U_2 = {\mathbf{0}}$.
Theorem (Dimension of a Sum). For subspaces $U_1, U_2 \subseteq V$:
$$\dim(U_1 + U_2) = \dim(U_1) + \dim(U_2) - \dim(U_1 \cap U_2)$$
This is the vector-space analogue of inclusion-exclusion.
Quotient Spaces
Given a subspace $U \subseteq V$, define the coset $v + U = {v + u : u \in U}$. The quotient space $V/U$ is the set of all cosets, with operations $(v + U) + (w + U) = (v+w) + U$ and $\alpha(v+U) = \alpha v + U$.
Theorem. $V/U$ is a vector space and $\dim(V/U) = \dim(V) - \dim(U)$ (for finite-dimensional $V$).
Quotient spaces appear in functional analysis (e.g., $L^p$ spaces are built from a quotient identifying functions equal almost everywhere) and in the first isomorphism theorem for linear maps.
Examples
Feature spaces. In supervised learning, a model class is often a subspace. Linear models are ${\mathbf{x} \mapsto \mathbf{w}^T \mathbf{x} : \mathbf{w} \in \mathbb{R}^d}$ - a vector space of functions (under pointwise addition and scalar multiplication). Kernel methods implicitly work in infinite-dimensional reproducing kernel Hilbert spaces (RKHS), where span and independence have analytic analogues.
Representation learning. A neural network learns a subspace (or low-dimensional manifold) of the high-dimensional input space where semantically similar inputs cluster. The linear layers perform exact subspace projections; nonlinearities fold and bend. Understanding which subspace a layer spans tells you about the model’s capacity.
Dimensionality reduction. PCA finds the $k$-dimensional subspace of $\mathbb{R}^d$ that maximizes retained variance. The subspace is spanned by the top $k$ eigenvectors of the sample covariance matrix. Every projection onto a subspace is a linear map whose null space is the orthogonal complement - exactly the quotient space $\mathbb{R}^d / U^\perp \cong U$.
Read Next: