Shortest Paths - When Greed Works and When It Doesn't
Helpful context:
- Graph Algorithms & Traversal - Maps, Networks, and Hidden Order
- Dynamic Programming - Never Solve the Same Problem Twice
In 1956, Edsger Dijkstra was on a coffee date with his fiancée in Amsterdam. He had no pen or paper. Sitting at a café table, he worked out an algorithm in his head for finding the shortest path between two cities in a road network.
He published it three years later. The paper is half a page long - one of the shortest papers in computer science history. It describes one of the most influential algorithms ever written.
Today, a version of that algorithm runs on every phone that uses GPS navigation. It finds the shortest route between two points on a road network with tens of millions of nodes in milliseconds.
The Problem
Given a weighted directed graph $G = (V, E, w)$ and a source vertex $s$, find the shortest (minimum-weight) path from $s$ to every other reachable vertex. Formally:
$$\delta(s, v) = \min_{p : s \leadsto v} \sum_{e \in p} w(e)$$
If no path exists, $\delta(s, v) = \infty$. If a negative-weight cycle lies on some path to $v$, the “shortest path” is $-\infty$ (you can loop forever to reduce the cost).
The right algorithm depends on two properties of the graph:
- Do edges have negative weights?
- Do you need one source to all others, or all pairs?
Dijkstra’s Algorithm: Greedy, Non-Negative Weights
Dijkstra’s algorithm is a greedy approach that works when all edge weights are non-negative. It maintains a priority queue of tentative distances and finalizes vertices in order of increasing distance.
import heapq
def dijkstra(graph, s):
# graph[u] = list of (v, weight) pairs
dist = {v: float('inf') for v in graph}
parent = {v: None for v in graph}
dist[s] = 0
pq = [(0, s)] # (distance, vertex)
while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue # stale entry; skip
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
parent[v] = u
heapq.heappush(pq, (dist[v], v))
return dist, parent
The key operation is relaxation: if we find a shorter path to $v$ through $u$, update dist[v].
Correctness invariant: When vertex $u$ is extracted from the priority queue, dist[u] equals the true shortest-path distance $\delta(s, u)$.
Proof. By induction on the order of extraction.
Base case: $s$ is extracted first with dist[s] = 0 = $\delta(s, s)$.
Inductive step: Assume every vertex extracted before $u$ has a correct dist value. Let dist[u] = d. Suppose for contradiction that $\delta(s, u) < d$. Then there exists a shortest path $P: s \leadsto u$ with total weight $\delta(s, u) < d$.
Since $s$ is extracted and $u$ is not yet correctly finalized, somewhere along $P$ there must be a transition from an extracted vertex to an unexplored one. Define:
- $x$ = the last extracted vertex on path $P$
- $y$ = the immediate successor of $x$ on $P$ (so $y$ has not yet been extracted)
Since $x$ was extracted before $u$, the inductive hypothesis gives $\text{dist}[x] = \delta(s, x)$. When $x$ was extracted, we relaxed all edges out of $x$, so:
$$\text{dist}[y] \leq \text{dist}[x] + w(x, y) = \delta(s, x) + w(x, y)$$
Since $P$ is a shortest path, optimal substructure holds: the sub-path $s \leadsto x \leadsto y$ along $P$ is itself a shortest path from $s$ to $y$, so:
$$\delta(s, x) + w(x, y) = \delta(s, y)$$
Since $y$ lies on $P$ and all remaining edges to $u$ have non-negative weight:
$$\delta(s, y) \leq \delta(s, u)$$
Chaining all three:
$$\text{dist}[y] \leq \delta(s, x) + w(x, y) = \delta(s, y) \leq \delta(s, u) < d = \text{dist}[u]$$
So $\text{dist}[y] < \text{dist}[u]$, which means $y$ would have been extracted from the priority queue before $u$ - contradicting our assumption that $y$ is unexplored when $u$ is extracted. Therefore $\delta(s, u) = d$. $\blacksquare$
Time complexity. With a binary heap: $V$ extract_min operations at $O(\log V)$ each, plus at most $E$ decrease_key operations at $O(\log V)$ each. Total: $O((V + E) \log V)$.
With a Fibonacci heap: decrease_key becomes $O(1)$ amortized. Total: $O(E + V \log V)$.
For dense graphs ($E = \Theta(V^2)$) with an adjacency matrix: scan all vertices to find the minimum, giving $O(V^2)$ - better than the heap version in this regime.
Why Dijkstra Fails on Negative Edges
Discomfort check. Why can’t Dijkstra handle negative edge weights? The invariant depends on a crucial property: once you extract a vertex $u$ and finalize its distance, no future path can be shorter. This holds when all weights are non-negative - any path through a vertex not yet extracted is at least as long as the direct path. But with negative edges, a future path might shortcut back through an already-finalized vertex via a negative edge, improving its distance. By the time you’d discover this, you’ve already removed $u$ from the queue and assumed its distance was final. Bellman-Ford handles this by simply not finalizing anything prematurely - it repeatedly relaxes all edges.
Bellman-Ford: Dynamic Programming, Negative Weights
Bellman-Ford handles negative edge weights (but not negative cycles). It’s a DP algorithm: define the subproblem by the number of edges allowed.
$$\text{dp}[v][k] = \text{shortest path from } s \text{ to } v \text{ using at most } k \text{ edges}$$
Recurrence:
$$\text{dp}[v][k] = \min\left(\text{dp}[v][k-1],\ \min_{(u,v) \in E} \text{dp}[u][k-1] + w(u,v)\right)$$
In practice, we run $V - 1$ rounds of edge relaxation over all edges:
def bellman_ford(vertices, edges, s):
# edges: list of (u, v, weight)
dist = {v: float('inf') for v in vertices}
dist[s] = 0
for _ in range(len(vertices) - 1):
for u, v, w in edges:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
# Detect negative cycles
for u, v, w in edges:
if dist[u] + w < dist[v]:
raise ValueError("Negative cycle detected")
return dist
Why does this work? Here is the core intuition, then the proof.
Think of Bellman-Ford as sending a wave of information outward from $s$, one hop at a time. After round 1, every vertex knows the best way to reach $s$ in a single hop. After round 2, it knows the best two-hop route. After round $k$, it knows the best $k$-hop route. Since any simple path in a $V$-vertex graph uses at most $V - 1$ edges, running $V - 1$ rounds is enough to settle every reachable vertex.
Formal claim. Let $\delta_k(s, v)$ denote the minimum weight of any path from $s$ to $v$ using at most $k$ edges ($\infty$ if none exists). After round $k$, for every vertex $v$:
$$\text{dist}[v] \leq \delta_k(s, v)$$
Proof by induction on $k$.
Base case ($k = 0$): After initialisation, $\text{dist}[s] = 0 = \delta_0(s, s)$, and $\text{dist}[v] = \infty = \delta_0(s, v)$ for all $v \neq s$, since no 0-edge path to any other vertex exists.
Inductive step: Assume after round $k - 1$, for all $v$:
$$\text{dist}[v] \leq \delta_{k-1}(s, v)$$
Fix any vertex $v$ and let $P^\star$ be an optimal path from $s$ to $v$ using at most $k$ edges. Two cases:
-
Case 1: $P^\star$ uses at most $k - 1$ edges. The shortest path using at most $k$ edges equals the shortest path using at most $k - 1$ edges, so the inductive hypothesis directly gives $\text{dist}[v] \leq \delta_k(s, v)$ after round $k$.
-
Case 2: $P^\star$ uses exactly $k$ edges. Let $u$ be the second-to-last vertex on $P^\star$, so $P^\star = s \leadsto u \to v$. The prefix $s \leadsto u$ uses $k - 1$ edges and, by optimal substructure, is itself a shortest $(k-1)$-edge path to $u$. By the inductive hypothesis, after round $k - 1$:
$$\text{dist}[u] \leq \delta_{k-1}(s, u)$$
In round $k$ we relax edge $(u, v, w)$:
$$\text{dist}[v] \leq \text{dist}[u] + w \leq \delta_{k-1}(s, u) + w = \delta_k(s, v)$$
Why $V - 1$ rounds suffice. In a graph with no negative cycles, every shortest path is simple - it never repeats a vertex, because repeating a vertex would mean traversing a cycle, and any non-negative cycle only adds weight. A simple path on at most $V$ vertices uses at most $V - 1$ edges, so:
$$\delta_{V-1}(s, v) = \delta(s, v)$$
After $V - 1$ rounds the claim gives $\text{dist}[v] \leq \delta(s, v)$. Since dist[v] is always the weight of some actual path, $\text{dist}[v] \geq \delta(s, v)$. Together: $\text{dist}[v] = \delta(s, v)$. $\blacksquare$
Negative cycle detection. After $V - 1$ rounds we have the best possible simple-path distances. If any edge $(u, v, w)$ can still be relaxed - i.e. $\text{dist}[u] + w < \text{dist}[v]$ - then the only way to improve beyond a simple path is to traverse a cycle. Since non-negative cycles never help, this cycle must be negative. Hence a negative-weight cycle is reachable from $s$.
Time complexity. $V - 1$ rounds, each scanning all $E$ edges: $O(VE)$.
Floyd-Warshall: All-Pairs Shortest Paths
Need shortest paths between all pairs of vertices? Floyd-Warshall does it in $O(V^3)$.
Intuition: grow the set of allowed intermediaries.
Label the vertices $1, 2, \ldots, V$. Run through them one by one and ask at each step:
For each pair $(i, j)$: what is the shortest path from $i$ to $j$ if only vertices $\{1, 2, \ldots, k\}$ may be used as intermediate stops?
When $k = 0$, no intermediaries are allowed - only direct edges. When $k = V$, all vertices are available: these are the true shortest paths.
At each step, adding vertex $k$ to the allowed set introduces exactly one new option for every pair $(i, j)$: route through $k$. The shortest path either does not use $k$ (answer unchanged from step $k - 1$) or uses $k$ exactly once, splitting into two sub-paths - $i \leadsto k$ and $k \leadsto j$ - both of which use only intermediaries $\{1, \ldots, k-1\}$.
A concrete example:
With no intermediaries ($k = 0$): the only known path from A to C is the direct edge, cost 10. When B enters the allowed set: check $A \to B \to C = 3 + 2 = 5 < 10$. Update. The direct edge is beaten by the two-hop path.
The recurrence. Define:
$$\text{dp}[i][j][k] = \text{shortest distance from } i \text{ to } j \text{ using only vertices } \{1, \ldots, k\} \text{ as intermediaries}$$
Base case: $\text{dp}[i][j][0] = w(i,j)$ if edge $(i, j)$ exists, else $\infty$; and $\text{dp}[i][i][0] = 0$.
For $k \geq 1$:
$$\text{dp}[i][j][k] = \min\Bigl(\underbrace{\text{dp}[i][j][k-1]}{\text{skip } k},\quad \underbrace{\text{dp}[i][k][k-1] + \text{dp}[k][j][k-1]}{\text{route through } k}\Bigr)$$
Why does this decomposition work? If the shortest path from $i$ to $j$ routes through $k$, it splits as $i \leadsto k \leadsto j$. Could either sub-path loop back through $k$? In a graph without negative cycles, shortest paths are simple - they visit each vertex at most once. So both sub-paths use only intermediaries from $\{1, \ldots, k-1\}$, and $\text{dp}[\cdot][\cdot][k-1]$ gives their lengths correctly by the inductive hypothesis.
Correctness. By induction on $k$. The base case holds by definition. Assuming $\text{dp}[\cdot][\cdot][k-1]$ is correct, the recurrence considers both cases (use $k$ or not) and takes the minimum - which is correct by hypothesis. After $V$ iterations, $\text{dp}[i][j][V] = \delta(i, j)$ for all pairs. $\blacksquare$
In practice, the $k$ dimension is eliminated by updating in-place. This is safe because $\text{dp}[k][k] = 0$ throughout: routing through $k$ to reach $k$ itself adds nothing, so the values $\text{dp}[i][k]$ and $\text{dp}[k][j]$ used on the right-hand side are already their correct $k - 1$ values.
def floyd_warshall(vertices, edges):
# edges: list of (u, v, weight)
dist = {u: {v: float('inf') for v in vertices} for u in vertices}
for v in vertices:
dist[v][v] = 0
for u, v, w in edges:
dist[u][v] = w
for k in vertices:
for i in vertices:
for j in vertices:
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
return dist
Negative cycles: if dist[i][i] becomes negative, vertex $i$ lies on a negative-weight cycle.
Floyd-Warshall works with negative edges as long as there are no negative cycles. Time: $O(V^3)$. Space: $O(V^2)$.
A*: Dijkstra with Direction
Dijkstra finds the shortest path to all vertices. When you only care about one destination $t$, you can stop early and guide the search using a heuristic - a function $h(v)$ estimating the remaining cost from $v$ to $t$.
A* assigns each vertex a priority $f(v) = g(v) + h(v)$, where $g(v)$ is the best known distance from $s$ to $v$. Vertices with low $f$-value get explored first: cheap to reach from $s$ and estimated to be close to $t$.
Admissibility. A heuristic is admissible if it never overestimates: $h(v) \leq \delta(v, t)$ for all $v$. An admissible heuristic is optimistic - it may underestimate, but never inflates the true remaining cost. This is what guarantees A* finds the optimal path.
Intuition. Dijkstra expands outward from $s$ uniformly in all directions - it has no information about where $t$ is. A* biases the expansion toward $t$: a vertex that is cheap to reach but far from $t$ (high $h$) gets deprioritized; a vertex pointing toward $t$ (low $h$) gets promoted to the front of the queue.
Example
h = 4"]) -->|2| A(["A
h = 5"]) S -->|3| B(["B
h = 2"]) A -->|6| T(["T
h = 0"]) B -->|2| T
The $h$ values are admissible: $h(S) = 4 \leq 5$, $h(A) = 5 \leq 6$, $h(B) = 2 \leq 2$, $h(T) = 0$. The optimal path is $S \to B \to T$ with cost 5.
Dijkstra (priority by $g$):
| Extracted | $g$ | Open set after |
|---|---|---|
| S | 0 | A $(g=2)$, B $(g=3)$ |
| A | 2 | B $(g=3)$, T $(g=8)$ |
| B | 3 | T $(g=5)$ |
| T | 5 | done |
Dijkstra extracts A even though it leads to a suboptimal path - it has no way to know A is a dead end.
A* (priority by $f = g + h$):
| Extracted | $f = g + h$ | Open set after |
|---|---|---|
| S | $0 + 4 = 4$ | A $(f=7)$, B $(f=5)$ |
| B | $3 + 2 = 5$ | A $(f=7)$, T $(f=5)$ |
| T | $5 + 0 = 5$ | done |
A* never extracts A. The heuristic recognized that A, despite being cheaply reachable ($g = 2$), is estimated to be far from T ($h = 5$), giving total estimated cost 7. B costs a bit more to reach ($g = 3$) but sits close to T ($h = 2$), giving estimated total 5 - matching the true optimal. A* finds the answer with one fewer node explored.
Consistency
A heuristic is consistent (or monotone) if for every edge $(u, v, w)$:
$$h(u) \leq w + h(v)$$
This is the triangle inequality on heuristic estimates: the estimated cost to $t$ should not improve by taking a detour through $v$. Consistency implies admissibility. With a consistent heuristic, $f$-values are non-decreasing along any explored path, so A* never needs to re-expand a vertex - it behaves exactly like Dijkstra with modified priorities.
Euclidean distance satisfies both properties: it never overestimates road distance, and it obeys the geometric triangle inequality.
Proof of Correctness
Claim. If $h$ is admissible and $h(t) = 0$, then when A* first extracts $t$, $g(t) = \delta(s, t)$.
Proof. Let $P^\star$ be a shortest $s$-$t$ path of cost $\delta(s, t)$.
At the moment A* is about to extract $t$, consider the vertices of $P^\star$ from $s$ toward $t$. Either all predecessors of $t$ on $P^\star$ have been extracted - in which case the progressive relaxations along $P^\star$ gave $g(t) \leq \delta(s, t)$ already - or there is a first vertex $v$ on $P^\star$ still in the open set.
In the second case, let $u$ be $v$’s predecessor on $P^\star$ (already extracted). By the same inductive argument as Dijkstra’s correctness proof, restricted to the extracted prefix of $P^\star$: $g(u) = \delta(s, u)$ when $u$ is extracted. The edge $u \to v$ is then relaxed, giving:
$$g(v) \leq g(u) + w(u, v) = \delta(s, u) + w(u, v) = \delta(s, v)$$
where the last equality uses the fact that $u \to v$ is a segment of the optimal path $P^\star$. Applying admissibility:
$$f(v) = g(v) + h(v) \leq \delta(s, v) + \delta(v, t) = \delta(s, t)$$
Since A* extracted $t$ before $v$, we have $f(t) \leq f(v) \leq \delta(s, t)$. Since $h(t) = 0$, this gives $g(t) \leq \delta(s, t)$.
In both cases, $g(t) \leq \delta(s, t)$. Since $g(t)$ is always the weight of some actual path, $g(t) \geq \delta(s, t)$. Therefore $g(t) = \delta(s, t)$. $\blacksquare$
Connection to Dijkstra
With $h(v) = 0$ for all $v$, A* reduces to Dijkstra exactly. With the perfect heuristic $h(v) = \delta(v, t)$, the $f$-value of every vertex on an optimal path equals $\delta(s, t)$ - they all share the same priority - and A* reaches $t$ without expanding a single off-path vertex. In practice, a good heuristic can reduce exploration by orders of magnitude on large graphs like road networks.
Applications
GPS navigation. Google Maps and similar apps use bidirectional Dijkstra or A* on road networks with hundreds of millions of nodes. Contraction hierarchies - a preprocessing technique that “shortcutts” the graph - allow queries to answer in milliseconds.
Internet routing. OSPF (Open Shortest Path First) uses Dijkstra on each router to compute its forwarding table. BGP (Border Gateway Protocol) uses a distributed Bellman-Ford variant.
Currency arbitrage. Model currencies as vertices; exchange rates as edge weights $-\log(\text{rate})$. A negative cycle in this graph corresponds to a sequence of conversions that yields a profit. Bellman-Ford detects it: if after $V - 1$ rounds a relaxation is still possible, an arbitrage opportunity exists. Financial systems scan for this constantly.
Summary
| Algorithm | Handles negative weights? | Complexity | Use case |
|---|---|---|---|
| Dijkstra (binary heap) | No | $O((V + E) \log V)$ | Single-source, non-negative weights |
| Dijkstra (dense) | No | $O(V^2)$ | Dense graphs |
| Dijkstra (Fibonacci heap) | No | $O(E + V \log V)$ | Single-source, theoretical best |
| Bellman-Ford | Yes (detects negative cycles) | $O(VE)$ | Single-source, negative weights |
| Floyd-Warshall | Yes (no negative cycles) | $O(V^3)$ | All-pairs |
| A* | No (same as Dijkstra) | $O((V + E) \log V)$ worst case | Single-target, with heuristic |
The shortest path problem is one of the most thoroughly studied in algorithms. Dijkstra’s 1959 half-page paper launched a literature that spans decades. What makes it remarkable is not just the algorithm - it’s the insight that greedy finalization works when costs are non-negative, that DP on path length works when they’re not, and that the two approaches are complementary faces of the same underlying idea.
Read next: