Spectral Graph Theory
Prerequisite:
Spectral graph theory studies graphs through the eigenvalues and eigenvectors of matrices associated with them. The central object is the graph Laplacian, whose spectrum encodes connectivity, expansion, mixing times of random walks, and the quality of graph partitions. This algebraic window into graph structure is the mathematical foundation of spectral clustering, community detection, and graph neural networks.
The Graph Laplacian
Let $G = (V, E)$ be an undirected, unweighted graph on $n$ vertices. Define:
- Adjacency matrix $A \in {0,1}^{n \times n}$: $A_{ij} = 1$ if $(i,j) \in E$, else $0$.
- Degree matrix $D = \text{diag}(d_1,\dots,d_n)$: $d_i = \sum_j A_{ij}$ is the degree of vertex $i$.
- Graph Laplacian $L = D - A$.
Graph G: A: D: L = D - A:
1---2 [0 1 1 0] [2 0 0 0] [ 2 -1 -1 0]
|\ | [1 0 1 0] [0 3 0 0] [-1 3 -1 -1]
| \| [1 1 0 1] [0 0 3 0] [-1 -1 3 -1]
3---4 [0 0 1 0] [0 0 0 1] [ 0 -1 -1 1]
Fundamental Properties
Theorem. The graph Laplacian $L$ satisfies:
-
Symmetry and PSD. $L$ is symmetric and positive semidefinite. For any $x \in \mathbb{R}^n$, $$x^T L x = \sum_{(i,j)\in E}(x_i - x_j)^2 \geq 0.$$
-
Null space. The all-ones vector $\mathbf{1}$ satisfies $L\mathbf{1} = 0$. More generally, the multiplicity of eigenvalue $0$ equals the number of connected components of $G$.
-
Eigenvalue ordering. The eigenvalues of $L$ satisfy $0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$.
The quadratic form $x^T L x = \sum_{(i,j)\in E}(x_i - x_j)^2$ is the key identity. It measures the total variation of a signal $x$ on the graph, making $L$ the graph analogue of the continuous Laplacian operator $-\nabla^2$.
Algebraic Connectivity and the Fiedler Value
Definition. The second-smallest eigenvalue $\lambda_2(L)$ is called the algebraic connectivity or Fiedler value of $G$.
Theorem. $\lambda_2(L) = 0$ if and only if $G$ is disconnected.
By the Courant-Fischer minimax theorem,
$$\lambda_2 = \min_{\substack{x \perp \mathbf{1} \ x \neq 0}} \frac{x^T L x}{|x|^2} = \min_{\substack{x \perp \mathbf{1} \ x \neq 0}} \frac{\sum_{(i,j)\in E}(x_i - x_j)^2}{\sum_i x_i^2}.$$
This variational characterisation shows that $\lambda_2$ captures how well-connected the graph is: a small $\lambda_2$ means there exists a smooth (slowly varying) non-constant function on the graph, i.e., a bottleneck. The corresponding eigenvector - the Fiedler vector - indicates the partition.
The Normalized Laplacian
When vertex degrees are heterogeneous, the normalized Laplacian is preferable:
$$\mathcal{L} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}.$$
Its eigenvalues lie in $[0,2]$ (with $0$ achieved by $\mathbf{1}$ and $2$ by bipartite graphs), making it better-suited for comparing graphs of different degree sequences. The normalised Laplacian is the natural object in spectral clustering via normalised cuts.
Cheeger’s Inequality
The conductance (or edge expansion) of a set $S \subseteq V$ is
$$h(S) = \frac{|E(S, V\setminus S)|}{\min(\text{vol}(S), \text{vol}(V\setminus S))},$$
where $\text{vol}(S) = \sum_{i \in S} d_i$ and $E(S, V\setminus S)$ is the edge boundary. The graph conductance is $h(G) = \min_{S} h(S)$.
Theorem (Cheeger’s Inequality). For the normalised Laplacian $\mathcal{L}$ with second eigenvalue $\lambda_2(\mathcal{L})$,
$$\frac{h(G)^2}{2} \leq \lambda_2(\mathcal{L}) \leq 2h(G).$$
The lower bound (harder direction) says that if $\lambda_2$ is small then there is a sparse cut. The upper bound (easier direction) says that a sparse cut implies a small $\lambda_2$. Together, the Fiedler value is a tight algebraic surrogate for graph conductance.
Small lambda_2 Large lambda_2
[A] --- [B] fully connected
few edges across no sparse cut
h(G) small h(G) large
Cheeger’s inequality is the graph-theoretic analogue of the PoincarĂ© inequality and is the theoretical justification for using $\lambda_2$ in graph partitioning algorithms.
Spectral Clustering
Given a graph $G$, spectral clustering embeds vertices into $\mathbb{R}^k$ using the eigenvectors of $L$ (or $\mathcal{L}$), then applies $k$-means to the embedded points.
Algorithm (Spectral Clustering).
- Compute the $k$ eigenvectors $v_1, \dots, v_k$ of $L$ corresponding to eigenvalues $\lambda_1 \leq \cdots \leq \lambda_k$.
- Form the matrix $U \in \mathbb{R}^{n \times k}$ with columns $v_1,\dots,v_k$.
- Treat each row $u_i \in \mathbb{R}^k$ as the embedding of vertex $i$.
- Run $k$-means on ${u_1,\dots,u_n}$ to obtain cluster assignments.
n vertices k-dimensional embedding k clusters
G=(V,E) --[eig(L)]--> U in R^{n x k} --[k-means]--> {C_1,...,C_k}
Theoretical guarantee. If $G$ is a disjoint union of $k$ cliques, then $\lambda_1 = \cdots = \lambda_k = 0$ with multiplicity $k$, and the eigenvectors are exactly the indicator vectors of the cliques. Perturbation theory (Davis-Kahan theorem) shows that for graphs close to this ideal, spectral clustering recovers the clusters up to small error.
Normalised Spectral Clustering
Using $\mathcal{L}$ instead of $L$ gives the Shi-Malik normalised cuts algorithm, which minimises the ratio of cut edges to the total degree within clusters. In practice, the rows of $U$ are often renormalised to unit length before $k$-means.
Random Walks and Mixing Times
The random walk transition matrix on $G$ is
$$P = D^{-1}A,$$
where $P_{ij} = A_{ij}/d_i$ is the probability of moving from $i$ to $j$ in one step. Its relationship to the normalised Laplacian is
$$P = I - D^{1/2} \mathcal{L} D^{-1/2},$$
so the eigenvalues of $P$ are $\mu_i = 1 - \lambda_i(\mathcal{L})$, all in $[-1,1]$.
Mixing Time and the Spectral Gap
For a connected non-bipartite graph, the random walk converges to the stationary distribution $\pi_i = d_i / \text{vol}(G)$. The spectral gap is $\lambda_2(\mathcal{L})$, and the mixing time satisfies
$$t_{\text{mix}} = O!\left(\frac{\log n}{\lambda_2(\mathcal{L})}\right).$$
A large spectral gap means rapid mixing; a small spectral gap (near-bottleneck) means slow mixing. By Cheeger’s inequality, rapid mixing is equivalent to high conductance.
Spectrum of P:
-1 0 mu_2 1
|--------|------------|----||
bipartite 1-lambda_2
Mixing rate governed by gap: 1 - mu_2 = lambda_2
The Heat Equation on Graphs
The continuous heat equation $\partial_t u = \Delta u$ generalises to graphs as
$$\frac{du}{dt} = -Lu, \quad u(0) = u_0 \in \mathbb{R}^n.$$
The solution is $u(t) = e^{-Lt} u_0$, where the matrix exponential is
$$e^{-Lt} = \sum_{i=1}^n e^{-\lambda_i t} v_i v_i^T,$$
with $(\lambda_i, v_i)$ the eigenpairs of $L$. High-frequency components (large $\lambda_i$) decay rapidly; the zero mode (constant function) persists. This is exactly the graph analogue of heat diffusion: rough signals on the graph are smoothed by the operator $e^{-Lt}$.
Interlacing Theorem
Theorem (Eigenvalue Interlacing). Let $H$ be an induced subgraph of $G$ on $m < n$ vertices, with Laplacian eigenvalues $\mu_1 \leq \cdots \leq \mu_m$. Then
$$\lambda_i \leq \mu_i \leq \lambda_{n-m+i}, \quad i = 1,\dots,m.$$
Interlacing controls how eigenvalues change under vertex deletion and is a key tool for bounding graph invariants and proving hardness results for spectral algorithms.
Graph Neural Networks
The spectral view of graphs directly motivates graph convolutional networks (GCNs). Define the symmetrically normalised adjacency
$$\hat{A} = \tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}, \quad \tilde{A} = A + I, \quad \tilde{D}_{ii} = 1 + d_i.$$
A GCN layer propagates node features $H^{(l)} \in \mathbb{R}^{n \times f_l}$ as
$$H^{(l+1)} = \sigma!\left(\hat{A}H^{(l)}W^{(l)}\right),$$
where $W^{(l)} \in \mathbb{R}^{f_l \times f_{l+1}}$ is a learnable weight matrix and $\sigma$ is an activation function.
Layer l: Layer l+1:
H^(l) in R^{n x f_l} --> sigma(A_hat H^(l) W^(l)) --> H^{l+1} in R^{n x f_{l+1}}
Each node aggregates features from its neighbours (one step of diffusion).
Multiplying by $\hat{A}$ is a single step of a lazy random walk - a low-pass filter on the graph. Stacking $k$ layers aggregates information from $k$-hop neighbourhoods. The spectral interpretation: the GCN applies a polynomial filter in the eigenvalue domain, suppressing high-frequency (rough) features while preserving smooth, community-level structure.
Applications
Community detection. In the stochastic block model with $k$ communities, the Fiedler vector of $L$ (and its generalisations) recovers community membership with high probability when inter-community edge probability is sufficiently smaller than intra-community probability.
Graph signal processing. Treating vertex signals as vectors in $\mathbb{R}^n$, the graph Fourier transform decomposes a signal in the basis of Laplacian eigenvectors. Low-frequency components (small $\lambda$) vary slowly across the graph; high-frequency components vary rapidly. This generalises classical Fourier analysis to irregular domains.
Semi-supervised learning. Given labels on a few vertices, smoothness with respect to $L$ (small $x^T L x$) is a natural regulariser: points connected by edges should have similar labels. Minimising the supervised loss plus $\alpha \cdot f^T L f$ over label function $f$ is a classic graph-based semi-supervised learning method whose solution is a system $(\mathbf{I} + \alpha L)f = y$.
Spectral graph theory reveals that the algebraic structure of a matrix associated with a graph encodes its combinatorial properties with remarkable fidelity. The eigenvalues are not mere numerical curiosities - they are the fundamental constants of graph geometry.
Read Next: