Shortest Paths (Dijkstra, Bellman-Ford) // Megha Bose

Prerequisite: Graph Algorithms & Traversal, Trees & Heaps

Overview

The shortest-path problem asks: given a weighted directed graph $G = (V, E, w)$ and a source vertex $s$, find the minimum-weight path from $s$ to every other vertex. The right algorithm depends on two key factors: whether edge weights can be negative, and whether you need one-to-all or all-pairs distances.

Formally, the shortest-path distance is

$$\delta(s, v) = \min_{p : s \leadsto v} \sum_{e \in p} w(e)$$

and $\delta(s,v) = \infty$ if no path exists, $-\infty$ if a negative-weight cycle lies on some path to $v$.

Dijkstra’s Algorithm

Dijkstra’s is a greedy algorithm for graphs with non-negative edge weights. It maintains a priority queue of tentative distances and “finalises” vertices in order of increasing distance.

Dijkstra(G, s):
    for each v in V: dist[v] = ∞, prev[v] = NIL
    dist[s] = 0
    Q = min-priority queue of all v, keyed by dist[v]
    while Q not empty:
        u = Q.extract_min()
        for each edge (u, v) with weight w:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w   // relaxation
                prev[v] = u
                Q.decrease_key(v, dist[v])

Complexity. With a binary heap: extract_min costs $O(\log V)$ and is called $V$ times; decrease_key costs $O(\log V)$ and is called at most $E$ times. Total: $O((V + E) \log V)$. With a Fibonacci heap, decrease_key becomes $O(1)$ amortised, giving $O(E + V \log V)$.

If the graph is dense ($E = \Theta(V^2)$) and represented by an adjacency matrix, a simple array for dist with linear-scan extract_min gives $O(V^2)$ - better than the heap version in that regime.

Proof of correctness. The key invariant is: when a vertex $u$ is extracted from $Q$, $\text{dist}[u] = \delta(s, u)$.

Proof by induction. Base: $s$ is extracted first with $\text{dist}[s] = 0 = \delta(s,s)$. Inductive step: suppose all previously extracted vertices have correct distances. Let $u$ be the next extracted vertex with $\text{dist}[u] = d$. Assume for contradiction that $\delta(s, u) < d$. Then some path $p : s \leadsto u$ has weight $< d$. Let $(x, y)$ be the first edge in $p$ where $y$ has not yet been extracted. Since $\text{dist}[x] = \delta(s,x)$ (by induction, $x$ was already extracted) and $w \geq 0$, we have:

$$\text{dist}[y] \leq \delta(s, x) + w(x,y) \leq \delta(s,u) < d = \text{dist}[u]$$

But then $y$ would have been extracted before $u$ - contradiction. Hence $\text{dist}[u] = \delta(s,u)$. $\square$

Bellman-Ford Algorithm

When edge weights may be negative, Dijkstra’s fails. Bellman-Ford runs $V - 1$ rounds of edge relaxation over all edges.

BellmanFord(G, s):
    for each v in V: dist[v] = ∞
    dist[s] = 0
    repeat V-1 times:
        for each edge (u, v, w) in E:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
    // Negative-cycle detection
    for each edge (u, v, w) in E:
        if dist[u] + w < dist[v]:
            report negative cycle

Complexity. $V - 1$ rounds each scanning $E$ edges: $O(VE)$.

Correctness. Any shortest path in a graph with no negative cycles uses at most $V - 1$ edges (since a simple path visits at most $V$ vertices). After $k$ rounds of relaxation, $\text{dist}[v]$ is the weight of the shortest path from $s$ to $v$ using at most $k$ edges. By induction on $k$, after $V-1$ rounds all distances are final. If a further relaxation is still possible, a negative-weight cycle is reachable.

SPFA (Shortest Path Faster Algorithm). A practical optimisation: maintain a queue of vertices whose distance was recently updated, and relax only their outgoing edges. Average case is much faster than $O(VE)$, but worst case remains $O(VE)$ - SPFA does not improve the worst-case bound.

Floyd-Warshall (All-Pairs)

Floyd-Warshall computes shortest paths between all pairs of vertices in $O(V^3)$ time and $O(V^2)$ space.

Define $\text{dp}[i][j][k]$ as the shortest path from $i$ to $j$ using only vertices ${1, \ldots, k}$ as intermediaries. The recurrence is:

$$\text{dp}[i][j][k] = \min!\left(\text{dp}[i][j][k-1],; \text{dp}[i][k][k-1] + \text{dp}[k][j][k-1]\right)$$

Base case: $\text{dp}[i][j][0] = w(i,j)$ if $(i,j) \in E$, else $\infty$ (0 for $i=j$). The third dimension can be eliminated (in-place update) since each $k$-update only reads row/column $k$ of the $(k-1)$ table, which has not changed yet.

Negative cycles are detected by checking whether any diagonal entry $\text{dp}[i][i]$ becomes negative after running the algorithm.

Johnson’s Algorithm (Sparse All-Pairs)

For sparse graphs where $E \ll V^2$, Johnson’s algorithm is faster than Floyd-Warshall:

Add a new vertex $q$; add zero-weight edges $(q, v)$ for all $v$.
Run Bellman-Ford from $q$ to compute potentials $h[v] = \delta(q, v)$. If a negative cycle exists, report it.
Reweight: define $\hat{w}(u,v) = w(u,v) + h[u] - h[v] \geq 0$.
Run Dijkstra from each vertex with the reweighted graph.
Recover true distances: $\delta(u,v) = \hat{\delta}(u,v) - h[u] + h[v]$.

The reweighting preserves shortest-path structure and makes all weights non-negative (by the triangle inequality for $h$). Total complexity: $O(VE + V^2 \log V)$ - better than Floyd-Warshall when $E = o(V^2)$.

A* Search

A* generalises Dijkstra for single-target queries by guiding the search with a heuristic $h(v)$ estimating the distance from $v$ to the target $t$. Each vertex is prioritised by $f(v) = g(v) + h(v)$, where $g(v) = \text{dist}[s,v]$ is the known distance from the source.

An admissible heuristic satisfies $h(v) \leq \delta(v, t)$ for all $v$ (never overestimates). Under admissibility, A* is guaranteed to find the optimal path. A consistent (monotone) heuristic additionally satisfies $h(u) \leq w(u,v) + h(v)$; this ensures vertices are extracted in non-decreasing $g$ order (no re-expansion needed), making A* as efficient as Dijkstra while exploring far fewer vertices in practice.

Examples

GPS routing. Navigation apps use bidirectional Dijkstra (or A* with Euclidean-distance heuristic) on road networks. The graph has tens of millions of nodes but is sparse ($E \approx 2V$ for planar road networks). Contraction hierarchies pre-process the graph so that queries answer in milliseconds.

Network packet routing. The OSPF protocol runs Dijkstra on each router to compute its forwarding table. All link weights are non-negative (they represent administrative costs), so Dijkstra applies directly. BGP, by contrast, must reason about policy and uses a Bellman-Ford-style distributed algorithm.

Currency arbitrage via negative-cycle detection. Model currencies as vertices and exchange rates as edges with weight $-\log(\text{rate})$. A negative cycle in this graph corresponds to a sequence of conversions that yields a profit. Bellman-Ford detects such cycles in $O(VE)$: if after $V-1$ rounds a further relaxation is still possible, an arbitrage opportunity exists.

Read Next:

Network Flow & Matching