Graph Algorithms & Traversal
Prerequisite: Trees & Heaps
Overview
Graphs model pairwise relationships between objects: road networks, dependency graphs, social connections, compiler call graphs. Two representations dominate in practice. An adjacency matrix uses a $V \times V$ boolean (or weight) matrix; entry $(i,j)$ answers “is there an edge?” in $O(1)$ but consumes $O(V^2)$ space regardless of how sparse the graph is. An adjacency list stores, for each vertex, only the neighbours that actually exist, giving $O(V + E)$ space - much better for the sparse graphs that arise in the real world.
Traversal algorithms visit every reachable vertex exactly once. Two strategies give fundamentally different orderings and expose different structural properties.
Breadth-First Search (BFS)
BFS explores vertices in non-decreasing order of their distance (number of hops) from the source. A FIFO queue holds the frontier.
BFS(G, s):
for each v in V: dist[v] = ∞, colour[v] = WHITE
dist[s] = 0, colour[s] = GRAY
Q = new Queue; Q.enqueue(s)
while Q not empty:
u = Q.dequeue()
for each neighbour v of u:
if colour[v] == WHITE:
colour[v] = GRAY
dist[v] = dist[u] + 1
parent[v] = u
Q.enqueue(v)
colour[u] = BLACK
Complexity. Each vertex is enqueued and dequeued once: $O(V)$ queue operations. Each edge $(u,v)$ is examined once from $u$’s adjacency list: $O(E)$ work. Total: $O(V + E)$ time, $O(V)$ auxiliary space for the queue and dist/colour arrays.
Correctness invariant. When a vertex $u$ is dequeued, $\text{dist}[u]$ equals the true shortest-path distance $\delta(s, u)$ in the unweighted graph. Proof sketch: by induction on distance layers. Layer 0 contains only $s$ with $\text{dist}[s]=0=\delta(s,s)$. If all vertices at distance $\leq k$ are correctly labelled when dequeued, then any vertex $v$ at distance $k+1$ has at least one neighbour $u$ at distance $k$; when $u$ is dequeued we set $\text{dist}[v] = k+1$ and enqueue $v$. No shorter path can exist because that would require a vertex at distance $< k$ whose neighbour $v$ has smaller distance, contradicting the inductive hypothesis.
Applications. BFS gives shortest (fewest-hop) paths in unweighted graphs. It also tests bipartiteness: a graph is bipartite iff BFS finds no odd-length cycle, which occurs iff no edge connects two vertices at the same BFS layer.
Depth-First Search (DFS)
DFS dives as deep as possible before backtracking. A global clock stamps discovery and finish times.
DFS(G):
for each v in V: colour[v] = WHITE, parent[v] = NIL
time = 0
for each v in V:
if colour[v] == WHITE: DFS_VISIT(G, v)
DFS_VISIT(G, u):
time = time + 1; d[u] = time; colour[u] = GRAY
for each neighbour v of u:
if colour[v] == WHITE:
parent[v] = u
DFS_VISIT(G, v)
colour[u] = BLACK; time = time + 1; f[u] = time
Complexity. Every vertex is visited once ($O(V)$) and every edge is inspected once from each endpoint’s adjacency list ($O(E)$). Total: $O(V + E)$.
Edge classification. Each edge $(u,v)$ falls into exactly one category:
- Tree edge: $v$ is WHITE when discovered from $u$.
- Back edge: $v$ is GRAY (an ancestor of $u$ in the DFS tree). A back edge indicates a cycle; directed graphs are acyclic iff DFS finds no back edge.
- Forward edge: $v$ is BLACK and $d[u] < d[v]$ (directed only).
- Cross edge: $v$ is BLACK and $d[u] > d[v]$ (directed only).
The parenthesis theorem captures nesting: for any two vertices $u$ and $v$, the intervals $[d[u], f[u]]$ and $[d[v], f[v]]$ are either disjoint or one contains the other.
Topological Sort
A topological ordering of a DAG is a linear order of vertices such that for every directed edge $(u \to v)$, $u$ appears before $v$. Two algorithms:
DFS-based: run DFS and collect vertices in decreasing order of finish time. Correctness follows from the fact that if $(u,v)$ is an edge in a DAG, then $f[v] < f[u]$ (since $v$ finishes before $u$ returns). Time: $O(V + E)$.
Kahn’s algorithm (in-degree queue):
Kahn(G):
compute in-degree deg[v] for all v
Q = queue of all v with deg[v] == 0
order = []
while Q not empty:
u = Q.dequeue(); order.append(u)
for each neighbour v of u:
deg[v] -= 1
if deg[v] == 0: Q.enqueue(v)
if len(order) != V: report cycle
return order
Time $O(V + E)$; if the final order has fewer than $V$ vertices, a cycle exists.
Strongly Connected Components (SCCs)
A strongly connected component of a directed graph is a maximal set of vertices $C$ such that every vertex in $C$ is reachable from every other vertex in $C$.
Kosaraju’s algorithm uses two DFS passes:
- Run DFS on $G$; record vertices in order of decreasing finish time.
- Compute $G^T$ (transpose: reverse all edges).
- Run DFS on $G^T$ in the order from step 1; each DFS tree in step 3 is one SCC.
Time $O(V + E)$. Correctness: if $u$ and $v$ are in the same SCC, they are mutually reachable in both $G$ and $G^T$; the finishing-time order ensures we never “escape” an SCC in the second pass.
Tarjan’s algorithm finds SCCs in a single DFS using low-link values. Each vertex $v$ maintains $\text{low}[v]$, the smallest discovery time reachable from the subtree rooted at $v$ via at most one back or cross edge. When $\text{low}[v] = d[v]$, $v$ is the root of an SCC; pop the stack down to $v$. Time $O(V+E)$, one pass.
Articulation Points and Bridges
In an undirected graph, a vertex is an articulation point (cut vertex) if removing it disconnects the graph. An edge is a bridge if removing it disconnects the graph. Both can be found in $O(V+E)$ by a single DFS: vertex $u$ is an articulation point iff it has a child $v$ in the DFS tree with $\text{low}[v] \geq d[u]$ (meaning $v$’s subtree has no back edge to a proper ancestor of $u$).
Examples
Web crawling as BFS. A web crawler starts from a seed URL and discovers pages layer by layer. BFS ensures that pages at distance $k$ (hops) from the seed are processed before any page at distance $k+1$, giving a breadth-first coverage of the web graph. The $\text{dist}$ array doubles as the “already-visited” set to avoid re-fetching.
Dependency resolution as topological sort. A package manager (npm, apt, Cargo) builds a DAG of dependencies. Kahn’s algorithm emits a valid install order: packages with no uninstalled dependencies come first. If the algorithm terminates with fewer than $|V|$ packages scheduled, a circular dependency has been detected and reported to the user.
SCCs in compiler optimization. A compiler constructs the call graph of a program. SCCs of the call graph correspond to mutually recursive function clusters. The condensation (DAG of SCCs) guides interprocedural analysis: the compiler processes SCCs bottom-up so callee summaries are available before callees are inlined into callers. Tarjan’s algorithm is used here because a single DFS is cache-friendlier than two passes.
Read Next: