Prerequisite:


The Abstract Definition

A vector space over a field $\mathbb{F}$ (take $\mathbb{F} = \mathbb{R}$ or $\mathbb{C}$) is a set $V$ equipped with two operations - addition $+: V \times V \to V$ and scalar multiplication $\cdot: \mathbb{F} \times V \to V$ - satisfying all eight axioms:

  1. Commutativity of addition: $u + v = v + u$ for all $u, v \in V$
  2. Associativity of addition: $(u + v) + w = u + (v + w)$
  3. Additive identity: there exists $\mathbf{0} \in V$ with $v + \mathbf{0} = v$ for all $v$
  4. Additive inverse: for every $v \in V$ there exists $-v \in V$ with $v + (-v) = \mathbf{0}$
  5. Multiplicative identity: $1 \cdot v = v$ for all $v$
  6. Associativity of scalar multiplication: $(\alpha \beta) v = \alpha (\beta v)$
  7. Distributivity over vector addition: $\alpha(u + v) = \alpha u + \alpha v$
  8. Distributivity over scalar addition: $(\alpha + \beta)v = \alpha v + \beta v$

The axioms are not redundant - each is necessary. Note: the zero vector is unique (if $\mathbf{0}'$ also works, then $\mathbf{0} = \mathbf{0} + \mathbf{0}' = \mathbf{0}'$), and additive inverses are unique by a similar argument.

Examples.

  • $\mathbb{R}^n$ with componentwise operations (the prototype).
  • $\mathbb{F}^{m \times n}$: $m \times n$ matrices with real entries.
  • $\mathcal{P}_n(\mathbb{R})$: polynomials of degree at most $n$, with pointwise addition and scalar multiplication.
  • $C([a,b])$: continuous functions on $[a,b]$ with pointwise operations.
  • ${0}$: the trivial vector space.

Subspaces

Definition. A nonempty subset $U \subseteq V$ is a subspace of $V$ if $U$ is itself a vector space under the same operations inherited from $V$.

Theorem (Subspace Test). A nonempty $U \subseteq V$ is a subspace if and only if it satisfies all three conditions:

  1. $\mathbf{0} \in U$
  2. $u, v \in U \Rightarrow u + v \in U$ (closed under addition)
  3. $\alpha \in \mathbb{F},\ u \in U \Rightarrow \alpha u \in U$ (closed under scalar multiplication)

Why only three? The other five axioms are inherited from $V$ for free - they are equations that hold for all elements of $V$, hence for all elements of $U \subseteq V$.

Why $\mathbf{0}$? Condition 3 with $\alpha = 0$ gives $0 \cdot u = \mathbf{0} \in U$, so condition 1 is actually implied by conditions 2 and 3 (provided $U$ is nonempty). Requiring $\mathbf{0}$ explicitly is cleaner and sufficient to verify nonemptiness.

Subspace U as a plane through the origin in R^3:

        z
        |
        |    * * *
        |  * U   *
        | *       *
        |*         *
--------+-----------> y
       /|
      / |
     x
  U must pass through the origin (0,0,0).
  Lines and planes NOT through origin are NOT subspaces.

Non-example. The set ${(x,y) \in \mathbb{R}^2 : x + y = 1}$ is not a subspace: it does not contain $\mathbf{0}$.


Linear Combinations and Span

A linear combination of vectors $v_1, \ldots, v_k \in V$ is any vector of the form $\sum_{i=1}^k \alpha_i v_i$ where $\alpha_i \in \mathbb{F}$.

Definition. The span of $S = {v_1, \ldots, v_k}$ is:

$$\text{span}(S) = \left{ \sum_{i=1}^k \alpha_i v_i : \alpha_i \in \mathbb{F} \right}$$

Theorem. $\text{span}(S)$ is a subspace of $V$ - in fact, the smallest subspace containing $S$.

Proof. It contains $\mathbf{0}$ (take all $\alpha_i = 0$). It is closed under addition and scalar multiplication by linearity of the sum. Any subspace containing $S$ must contain all linear combinations of $S$, hence contains $\text{span}(S)$.


Linear Independence

Definition. A list $v_1, \ldots, v_k \in V$ is linearly independent if the only solution to $\sum_{i=1}^k \alpha_i v_i = \mathbf{0}$ is $\alpha_1 = \cdots = \alpha_k = 0$. Otherwise the list is linearly dependent.

Linear dependence has a concrete interpretation: a list is linearly dependent iff some vector in it lies in the span of the others. Formally: if $\sum \alpha_i v_i = \mathbf{0}$ with $\alpha_j \neq 0$, then $v_j = -\alpha_j^{-1} \sum_{i \neq j} \alpha_i v_i \in \text{span}({v_i}_{i \neq j})$.

Test in $\mathbb{R}^n$. Arrange $v_1, \ldots, v_k$ as columns of a matrix $A \in \mathbb{R}^{n \times k}$. The list is linearly independent iff $\text{null}(A) = {\mathbf{0}}$, iff $\text{rank}(A) = k$, iff the system $A\boldsymbol{\alpha} = \mathbf{0}$ has only the trivial solution.


Basis and Dimension

Definition. A basis of $V$ is a list of vectors that is both linearly independent and spans $V$.

Theorem (Existence of Basis). Every spanning list in $V$ contains a basis. Every linearly independent list extends to a basis (in finite-dimensional spaces).

Proof sketch (spanning list contains a basis). If $v_1, \ldots, v_k$ spans $V$ and is dependent, some $v_j \in \text{span}({v_i}_{i \neq j})$; remove it. The remaining list still spans. Repeat until independent.

Theorem (Uniqueness of Representation). If $v_1, \ldots, v_n$ is a basis of $V$, then every $v \in V$ has a unique representation $v = \sum_{i=1}^n \alpha_i v_i$.

Proof. Existence: by spanning. Uniqueness: if $\sum \alpha_i v_i = \sum \beta_i v_i$, then $\sum (\alpha_i - \beta_i) v_i = \mathbf{0}$; independence forces $\alpha_i = \beta_i$ for all $i$.

Theorem (Dimension is Well-Defined). Any two bases of a finite-dimensional $V$ have the same number of elements. This common number is $\dim(V)$.

Proof. If $B_1$ has $m$ elements and $B_2$ has $n$ elements and both are bases, one can show $m \leq n$ (because any linearly independent list has length $\leq$ any spanning list) and by symmetry $n \leq m$.

Standard dimensions: $\dim(\mathbb{R}^n) = n$, $\dim(\mathcal{P}_n) = n+1$, $\dim(\mathbb{F}^{m \times n}) = mn$.


Infinite-Dimensional Spaces

Not all vector spaces are finite-dimensional. $C([0,1])$ (continuous functions) is infinite-dimensional: the list $1, x, x^2, x^3, \ldots$ is linearly independent (by properties of polynomials), and no finite list spans $C([0,1])$.

$\mathcal{P}(\mathbb{R})$, polynomials of arbitrary degree, is infinite-dimensional with countable basis ${1, x, x^2, \ldots}$. Function spaces like $L^2([0,1])$ admit orthonormal bases (Fourier series), but these require the machinery of Hilbert spaces - limits of infinite sums converge only in an analytic sense.


Direct Sums

Definition. $V$ is the direct sum $U_1 \oplus U_2$ of subspaces $U_1, U_2$ if every $v \in V$ can be written uniquely as $v = u_1 + u_2$ with $u_i \in U_i$.

Theorem. $V = U_1 \oplus U_2$ if and only if $V = U_1 + U_2$ and $U_1 \cap U_2 = {\mathbf{0}}$.

Theorem (Dimension of a Sum). For subspaces $U_1, U_2 \subseteq V$:

$$\dim(U_1 + U_2) = \dim(U_1) + \dim(U_2) - \dim(U_1 \cap U_2)$$

This is the vector-space analogue of inclusion-exclusion.


Quotient Spaces

Given a subspace $U \subseteq V$, define the coset $v + U = {v + u : u \in U}$. The quotient space $V/U$ is the set of all cosets, with operations $(v + U) + (w + U) = (v+w) + U$ and $\alpha(v+U) = \alpha v + U$.

Theorem. $V/U$ is a vector space and $\dim(V/U) = \dim(V) - \dim(U)$ (for finite-dimensional $V$).

Quotient spaces appear in functional analysis (e.g., $L^p$ spaces are built from a quotient identifying functions equal almost everywhere) and in the first isomorphism theorem for linear maps.


Examples

Feature spaces. In supervised learning, a model class is often a subspace. Linear models are ${\mathbf{x} \mapsto \mathbf{w}^T \mathbf{x} : \mathbf{w} \in \mathbb{R}^d}$ - a vector space of functions (under pointwise addition and scalar multiplication). Kernel methods implicitly work in infinite-dimensional reproducing kernel Hilbert spaces (RKHS), where span and independence have analytic analogues.

Representation learning. A neural network learns a subspace (or low-dimensional manifold) of the high-dimensional input space where semantically similar inputs cluster. The linear layers perform exact subspace projections; nonlinearities fold and bend. Understanding which subspace a layer spans tells you about the model’s capacity.

Dimensionality reduction. PCA finds the $k$-dimensional subspace of $\mathbb{R}^d$ that maximizes retained variance. The subspace is spanned by the top $k$ eigenvectors of the sample covariance matrix. Every projection onto a subspace is a linear map whose null space is the orthogonal complement - exactly the quotient space $\mathbb{R}^d / U^\perp \cong U$.


Read Next: