Minimum Spanning Trees & Union-Find - Spanning a World on a Budget
Helpful context:
- Graph Algorithms & Traversal - Maps, Networks, and Hidden Order
- Greedy Algorithms - The Art of Never Looking Back
In the 1920s, Czech mathematician Otakar Borůvka was working for an electrical utility company in Moravia. His job: design an electrical network to connect a set of cities using the minimum amount of cable. He invented an algorithm to do it.
Three decades later, in 1956, Joseph Kruskal independently published the same algorithm. A year after that, Robert Prim published yet another version. The same fundamental structure - reinvented three times across 30 years, in two different countries, for two different applications - because it is the natural answer to a natural question.
The question: how do you connect all nodes in a network with minimum total edge cost?
Spanning Trees and Why Minimum Ones Matter
A spanning tree of a connected undirected graph is a subgraph that:
- Includes all $V$ vertices
- Is connected
- Has no cycles
Any spanning tree of a $V$-vertex graph has exactly $V - 1$ edges. (One edge fewer would disconnect it; one edge more would create a cycle.)
A minimum spanning tree (MST) is a spanning tree with the smallest possible total edge weight.
Applications: designing road networks, laying fiber-optic cable, building efficient electrical grids, clustering in machine learning (remove the longest edges), approximating TSP.
Why Greedy Works: The Cut Property
Both Kruskal’s and Prim’s algorithms are greedy. What makes greedy correct for MST when it fails for so many other problems?
The answer is the cut property, which is the theoretical foundation for both algorithms.
Definition. A cut of a graph is a partition of $V$ into two non-empty sets $(S, V \setminus S)$. An edge crosses the cut if one endpoint is in $S$ and the other is in $V \setminus S$.
Cut property. For any cut of the graph, if edge $e$ is the unique minimum-weight edge crossing the cut, then $e$ is in every MST.
Proof sketch. Suppose $e$ is not in some MST $T$. Adding $e$ to $T$ creates a cycle. That cycle must include at least one other edge $f$ that crosses the same cut (since $e$ crosses the cut, and a cycle must cross any cut an even number of times). Since $e$ is the unique minimum across the cut, $w(e) < w(f)$. Swap $f$ for $e$: the new tree is connected, acyclic, and has smaller total weight - contradicting that $T$ was an MST.
This theorem is why greedy is correct here: at each step, both algorithms add a minimum-weight edge across some cut, and the cut property guarantees it belongs to some MST.
Kruskal’s Algorithm
Sort all edges by weight. Process them in increasing order: add an edge to the MST if and only if it doesn’t create a cycle.
Kruskal(G, w):
sort edges by weight
T = empty set
for each edge (u, v) in sorted order:
if u and v are in different components:
T = T union {(u, v)}
return T
Correctness. Each edge added is the minimum-weight edge crossing the cut defined by the two components it bridges. By the cut property, it belongs to some MST.
Time complexity. Sorting $E$ edges: $O(E \log E)$. Each cycle-detection check: this is where Union-Find comes in (see below). Total: $O(E \log E)$, dominated by sorting.
Prim’s Algorithm
Grow a tree greedily from a starting vertex. At each step, add the minimum-weight edge connecting the current tree to a vertex not yet in the tree.
Prim(G, w, r):
for each v: key[v] = infinity, in_tree[v] = false
key[r] = 0
Q = min-priority queue of all vertices keyed by key[v]
while Q not empty:
u = Q.extract_min()
in_tree[u] = true
for each neighbor v of u with weight w(u,v):
if not in_tree[v] and w(u,v) < key[v]:
key[v] = w(u,v)
parent[v] = u
Q.decrease_key(v, key[v])
Time complexity. The bottleneck is the priority queue. A binary heap is the standard implementation - it stores elements in a tree-shaped array where the smallest element is always at the top. Extracting the minimum costs $O(\log V)$ and updating a key (decrease_key) costs $O(\log V)$, giving $O((V + E) \log V) = O(E \log V)$ overall for connected graphs. A Fibonacci heap is a more sophisticated structure that makes decrease_key cost only $O(1)$ amortised (by lazily deferring reorganisation), reducing the total to $O(E + V \log V)$ - a meaningful win on dense graphs where decrease_key is called many times. In practice, binary heaps are almost always used because Fibonacci heaps have large constant factors and complex implementation.
Prim’s is structurally identical to Dijkstra’s algorithm for shortest paths - the only difference is what we minimize (edge weight to the tree, not cumulative path distance). Understanding one gives you the other for free.
Union-Find: The Workhorse of Kruskal’s
Kruskal’s needs to answer cycle-detection queries efficiently: are vertices $u$ and $v$ already in the same connected component?
The Union-Find (or Disjoint-Set Union, DSU) data structure maintains a collection of disjoint sets and supports two operations:
- Find($x$): Return the representative (“root”) of the set containing $x$.
- Union($x$, $y$): Merge the sets containing $x$ and $y$.
For Kruskal’s: before adding edge $(u, v)$, check if $\text{Find}(u) == \text{Find}(v)$. If yes, skip (cycle). If no, add the edge and call $\text{Union}(u, v)$.
Implementation with two optimizations:
Union by rank. When merging two trees, attach the shorter tree under the root of the taller one. This keeps trees short.
Path compression. When computing $\text{Find}(x)$, make every node on the path to the root point directly to the root. Future queries are faster.
parent = [i for i in range(n)]
rank = [0] * n
def find(x):
if parent[x] != x:
parent[x] = find(parent[x]) // path compression
return parent[x]
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry: return
if rank[rx] < rank[ry]: rx, ry = ry, rx // union by rank
parent[ry] = rx
if rank[rx] == rank[ry]: rank[rx] += 1
The amortized cost per operation with both optimisations is $O(\alpha(n))$, where $\alpha$ is the inverse Ackermann function - a value that never exceeds 5 for any input size you will ever encounter in practice. Treat it as effectively constant.
Prim’s vs. Kruskal’s: When to Use Which
Both find the MST. In practice:
- Dense graphs ($E \approx V^2$): Prim’s with a simple array (no heap, just a linear scan for minimum) runs in $O(V^2)$, which beats Kruskal’s $O(E \log E) = O(V^2 \log V)$.
- Sparse graphs ($E \approx V$): Kruskal’s $O(E \log E)$ is fast, and sorting is cache-friendly.
- Parallel/distributed settings: Borůvka’s algorithm (the original) is naturally parallelizable: every component simultaneously picks its cheapest outgoing edge.
Applications
Network design. Design a road network, fiber-optic cable layout, or power grid to connect all nodes with minimum total cost.
Cluster analysis. Run Kruskal’s; stop before adding the $k$ most expensive edges. The resulting $k$ components are a natural clustering. This is single-linkage hierarchical clustering.
Image segmentation. Model each pixel as a vertex; edge weights encode color similarity between adjacent pixels. An MST groups similar pixels and can be thresholded to segment the image.
Approximation algorithms. A 2-approximation for metric TSP: compute the MST, then do a DFS traversal. The MST cost is a lower bound on the optimal tour (remove any edge from the tour to get a spanning tree), so the DFS traversal gives at most $2 \times$ optimal.
Summary
| Algorithm | Complexity | Strategy | Best for |
|---|---|---|---|
| Kruskal’s | $O(E \log E)$ | Sort edges, add if no cycle | Sparse graphs |
| Prim’s (binary heap) | $O(E \log V)$ | Grow tree from seed, greedy | General |
| Prim’s (Fibonacci heap) | $O(E + V \log V)$ | Same | Dense graphs |
| Union-Find (with both optimizations) | $O(\alpha(n))$ amortized per op | Path compression + union by rank | Cycle detection in Kruskal’s |
The cut property is the one theorem you need to understand why MST algorithms are correct. Every greedy step - whether Kruskal’s picks the globally cheapest edge, or Prim’s picks the cheapest edge connecting the current tree to a new vertex - is adding a minimum-weight edge across some cut. The cut property guarantees that edge belongs to some MST. After $V - 1$ such steps, you have the MST.
The same structure, reinvented by Borůvka in the 1920s and rediscovered twice in the 1950s. It’s the natural answer to the minimum connectivity question - once you see it, you can’t imagine doing it any other way.
Read next: