Vectors: Two Views, One Object

A vector in $\mathbb{R}^n$ is an ordered $n$-tuple of real numbers. Write it as a column:

$$\mathbf{v} = \begin{pmatrix} v_1 \ v_2 \ \vdots \ v_n \end{pmatrix} \in \mathbb{R}^n$$

Geometric view. In $\mathbb{R}^2$ and $\mathbb{R}^3$, a vector is an arrow from the origin to the point $(v_1, v_2)$ or $(v_1, v_2, v_3)$. Direction and magnitude matter; the base point does not.

Algebraic view. A vector is just an element of a set closed under two operations: addition and scalar multiplication. For $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ and $\alpha \in \mathbb{R}$:

$$(\mathbf{u} + \mathbf{v})_i = u_i + v_i, \qquad (\alpha \mathbf{v})_i = \alpha v_i$$

These operations satisfy commutativity, associativity, existence of a zero vector $\mathbf{0}$, and additive inverses. This algebraic skeleton - not the arrows - is what generalizes to function spaces and polynomial spaces.

The Dot Product

The dot product (standard inner product on $\mathbb{R}^n$) of $\mathbf{u}$ and $\mathbf{v}$ is:

$$\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^n u_i v_i = \mathbf{u}^T \mathbf{v}$$

It encodes geometry: $\mathbf{u} \cdot \mathbf{v} = |\mathbf{u}||\mathbf{v}|\cos\theta$, where $\theta$ is the angle between them. Two vectors are orthogonal iff $\mathbf{u} \cdot \mathbf{v} = 0$. The norm is $|\mathbf{v}| = \sqrt{\mathbf{v} \cdot \mathbf{v}}$.


Matrices: Linear Maps in Disguise

An $m \times n$ matrix $A$ has $m$ rows and $n$ columns. But the right way to think about it: $A$ is a function $\mathbb{R}^n \to \mathbb{R}^m$ defined by $\mathbf{x} \mapsto A\mathbf{x}$.

Matrix-Vector Multiplication: Two Views

Row picture. The $i$-th component of $A\mathbf{x}$ is the dot product of the $i$-th row of $A$ with $\mathbf{x}$:

$$(A\mathbf{x})i = \sum{j=1}^n A_{ij} x_j = \mathbf{a}_i^T \mathbf{x}$$

Column picture. $A\mathbf{x}$ is a linear combination of the columns of $A$, with coefficients from $\mathbf{x}$:

$$A\mathbf{x} = x_1 \mathbf{a}^{(1)} + x_2 \mathbf{a}^{(2)} + \cdots + x_n \mathbf{a}^{(n)}$$

where $\mathbf{a}^{(j)}$ denotes the $j$-th column. This column picture is the key to understanding when $A\mathbf{x} = \mathbf{b}$ has a solution.

Matrix Multiplication

For $A \in \mathbb{R}^{m \times k}$ and $B \in \mathbb{R}^{k \times n}$, the product $C = AB \in \mathbb{R}^{m \times n}$ has:

$$C_{ij} = \sum_{p=1}^k A_{ip} B_{pj}$$

Equivalently: (a) each column of $C$ is $A$ times the corresponding column of $B$; (b) $C = \sum_{p=1}^k \mathbf{a}^{(p)} (\mathbf{b}^{(p)})^T$ as a sum of rank-1 outer products. The outer-product view is fundamental in low-rank approximation and ML.

Matrix multiplication is associative but not commutative: $AB \neq BA$ in general.

Transpose and Inverse

The transpose $A^T$ satisfies $(A^T){ij} = A{ji}$ and $(AB)^T = B^T A^T$.

The identity matrix $I_n$ has $(I_n){ij} = \delta{ij}$ (Kronecker delta). For a square matrix $A \in \mathbb{R}^{n \times n}$, the inverse $A^{-1}$ (if it exists) satisfies $A A^{-1} = A^{-1} A = I_n$. A matrix is invertible (nonsingular) iff its columns are linearly independent, iff its determinant is nonzero, iff $0$ is not an eigenvalue.


Linear Systems: $A\mathbf{x} = \mathbf{b}$

A system of $m$ equations in $n$ unknowns is $A\mathbf{x} = \mathbf{b}$ where $A \in \mathbb{R}^{m \times n}$, $\mathbf{b} \in \mathbb{R}^m$.

Row Picture vs. Column Picture

Row picture. Each equation $\mathbf{a}_i^T \mathbf{x} = b_i$ is a hyperplane in $\mathbb{R}^n$. A solution is a point lying in the intersection of all $m$ hyperplanes.

Column picture. We ask: can $\mathbf{b}$ be written as a linear combination of the columns of $A$? The system is solvable iff $\mathbf{b} \in \text{col}(A)$.

Column picture for Ax = b (2x2 case):

      ^
      |   . a2          b = x1*a1 + x2*a2
      |  /
  b . | /  . x2*a2
      |/ 
------+----------->
      |\ . x1*a1
      | \
      |  . a1

Existence and Uniqueness

Theorem (Existence). The system $A\mathbf{x} = \mathbf{b}$ has a solution if and only if $\mathbf{b} \in \text{col}(A)$, equivalently $\text{rank}(A) = \text{rank}([A \mid \mathbf{b}])$.

Theorem (Uniqueness). If a solution exists, it is unique if and only if the null space of $A$ is trivial: $\text{null}(A) = {\mathbf{0}}$, equivalently $\text{rank}(A) = n$.

General solution structure. If $\mathbf{x}_p$ is any particular solution and $\mathbf{x}_h \in \text{null}(A)$, then $\mathbf{x}_p + \mathbf{x}_h$ is also a solution. The complete solution set is $\mathbf{x}_p + \text{null}(A)$ - a coset (affine subspace) of the null space.


Rank, Nullity, and the Fundamental Theorem

Definition. The column space (range) of $A$ is $\text{col}(A) = {A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n} \subseteq \mathbb{R}^m$. The null space (kernel) is $\text{null}(A) = {\mathbf{x} \in \mathbb{R}^n : A\mathbf{x} = \mathbf{0}}$. The rank is $r = \dim(\text{col}(A))$ and the nullity is $\dim(\text{null}(A))$.

Theorem (Rank-Nullity). For $A \in \mathbb{R}^{m \times n}$:

$$\text{rank}(A) + \text{nullity}(A) = n$$

Proof sketch. Let $r = \text{rank}(A)$ and choose a basis ${\mathbf{v}_1, \ldots, \mathbf{v}_r}$ for the row space of $A$ (equivalently, extend a basis for $\text{null}(A)$ to all of $\mathbb{R}^n$ - the complementary $r$ vectors map injectively into $\text{col}(A)$, establishing a bijection). The dimension count follows.

The Fundamental Theorem of Linear Algebra (Strang’s formulation) identifies four fundamental subspaces:

Subspace Lives in Dimension
Column space $\text{col}(A)$ $\mathbb{R}^m$ $r$
Null space $\text{null}(A)$ $\mathbb{R}^n$ $n - r$
Row space $\text{col}(A^T)$ $\mathbb{R}^n$ $r$
Left null space $\text{null}(A^T)$ $\mathbb{R}^m$ $m - r$

Moreover, $\text{col}(A) \perp \text{null}(A^T)$ and $\text{col}(A^T) \perp \text{null}(A)$ - the column space and left null space are orthogonal complements in $\mathbb{R}^m$, and the row space and null space are orthogonal complements in $\mathbb{R}^n$.


Examples

Matrices as data transformations. In machine learning, data is a matrix $X \in \mathbb{R}^{n \times d}$ ($n$ samples, $d$ features). A weight matrix $W \in \mathbb{R}^{d \times k}$ maps each sample $\mathbf{x}_i \in \mathbb{R}^d$ to $W^T \mathbf{x}_i \in \mathbb{R}^k$ - a linear layer. Understanding which inputs get mapped to zero (the null space) or which outputs are reachable (the column space) tells you about expressive power and degeneracy.

Image compression via rank. A grayscale image is a matrix $A \in \mathbb{R}^{m \times n}$. Its outer-product decomposition $A = \sum_i \sigma_i \mathbf{u}_i \mathbf{v}_i^T$ (SVD) approximates $A$ with a rank-$k$ matrix by keeping only the top $k$ terms - rank-nullity says the approximation error lives in a subspace of dimension $n - k$.

Solving $Ax = b$ is ubiquitous. Linear regression minimizes $|A\mathbf{x} - \mathbf{b}|^2$. Its solution satisfies the normal equations $A^T A \mathbf{x} = A^T \mathbf{b}$. The system $A^T A$ is always square and symmetric; it is invertible iff $A$ has full column rank (no redundant features).


Read Next: