Greedy Algorithms - The Art of Never Looking Back // Megha Bose

Helpful context:

Divide & Conquer - Small Savings That Compound

In 1956, a computer scientist named Joseph Kruskal was thinking about how to wire up a network of cities with the minimum total length of cable. He described his algorithm in one sentence: always add the cheapest edge that doesn’t create a cycle.

That’s it. No lookahead. No backtracking. No global view of the emerging solution. Just a local rule, applied repeatedly.

And it gives the globally optimal answer - provably.

This is the essence of a greedy algorithm. It’s not about being reckless. It’s about finding the class of problems where local optimality implies global optimality - and then proving it.

The Greedy Template

A greedy algorithm builds a solution step by step. At each step, it makes the choice that looks best right now and never revisits that choice.

solution = empty
while problem not solved:
    make the locally best available choice
    add it to the solution

This is incredibly simple. The hard part isn’t the algorithm - it’s knowing when the algorithm is correct. The naive nearest-neighbor heuristic for the traveling salesman problem looks like a greedy algorithm and is usually wrong. Kruskal’s algorithm for minimum spanning trees looks like a greedy algorithm and is always right.

The difference is structural, and we’ll spend most of this post understanding it.

Why Greedy Can Fail

Every choice you make in a greedy algorithm is not just a choice for right now - it reshapes the entire landscape of future choices. When you commit to the locally best option, you close doors. Some of those closed doors might have led somewhere better overall.

The intuition: imagine you’re hiking and want to reach the highest point. Greedy says always step uphill. But if you’re in a hilly landscape with many peaks, always stepping uphill traps you at whichever peak you started climbing - not necessarily the tallest one. To reach the global maximum you might need to step down first, which greedy will never do.

This is the core failure mode. The current best choice might require you to make a compromise later that costs more than the gain from being greedy now. You cannot be myopic - just because your current step is locally optimal doesn’t mean the overall path is globally optimal. In fact, by choosing optimally at every step, you can end up worse than if you had accepted a suboptimal step somewhere in the middle.

The 0/1 knapsack is the clearest example: the highest value-per-weight item looks like the obvious first choice. But taking it might fill the knapsack in a way that leaves you stuck with only small items, while a combination of slightly lower-ratio items would have been worth far more total. The greedy choice constrained the future too severely.

When Greedy Works: Two Properties

Greedy works in exactly two situations.

The first: making the greedy choice does not meaningfully constrain your future choices. Whatever you pick now, the remaining options are just as good as if you’d picked something else. You’re not closing any important doors. Activity selection is like this: picking the earliest-finishing activity leaves the maximum future time open - you haven’t sacrificed any future opportunity by being greedy.

The second: the greedy choice does constrain future choices, but the optimal solution to whatever remains still combines with your choice to give the global optimum. You’ve narrowed the problem, but you haven’t broken it. This is a structural property of the problem, not something you can assume - you have to prove it.

Formally, these are called the greedy choice property (the locally optimal choice is always compatible with some globally optimal solution - you never need to undo it) and optimal substructure (once you make the greedy choice, the remaining subproblem’s optimal solution combines with it to give global optimality).

Together they mean: pick the locally best thing, solve the rest optimally, and you’ve solved the whole problem optimally. The hard part is proving your problem actually has both properties - and when it doesn’t, you need dynamic programming instead.

How do you prove the greedy choice property? Usually with an exchange argument: assume there’s an optimal solution that doesn’t include the greedy choice. Show that you can swap the greedy choice in (and swap something else out) without making the solution worse. This contradiction shows the greedy choice must be in some optimal solution.

Activity Selection: A Proof of Correctness

You have a set of activities, each with a start time and finish time. You can do at most one activity at a time. Which activities should you schedule to maximize the number you complete?

Greedy strategy: always pick the activity with the earliest finish time that doesn’t conflict with what you’ve already scheduled.

Complexity: O(n log n) - sort activities by finish time once, then do a single linear scan. The sort dominates.

gantt title Activity Selection - Six Activities, Three Compatible dateFormat X axisFormat %s section Selected A : active, 1, 3 C : active, 4, 7 F : active, 8, 11 section Rejected B (conflicts with A) : crit, 2, 5 D (conflicts with A and C) : crit, 3, 8 E (conflicts with C) : crit, 6, 10

Greedy picks A (earliest finish at 3), then C (next earliest at 7 that doesn’t conflict), then F (next at 11). Three activities - the maximum possible.

Why does this work? Here is the exchange argument.

Suppose the greedy algorithm picks activity $g$ first (earliest finish time). Consider any other optimal solution $OPT$ that doesn’t include $g$. It must start with some other activity $a$ with a later finish time (since $g$ has the earliest).

Now construct a new solution: take $OPT$, remove $a$, and add $g$. Since $g$ finishes no later than $a$, and everything else in $OPT$ starts after $a$ finishes, they all still start after $g$ finishes. The new solution is still valid - no conflicts - and has the same number of activities. So there’s an optimal solution that includes $g$.

By the same argument applied to the remaining problem after $g$ is scheduled, the greedy choice is always safe.

This proof pattern - assume the greedy choice isn’t in OPT, swap it in, verify the solution doesn’t get worse - is the standard exchange argument. It’s the tool you reach for whenever you need to prove a greedy algorithm is correct.

Fractional Knapsack: Greedy Works

You’re packing a knapsack. You have items with weights and values. The knapsack has a weight limit. You want to maximize total value.

Fractional knapsack: you can take a fraction of any item (pour the liquid, cut the gold). The greedy strategy: sort items by value-to-weight ratio, take the most valuable per kilogram first, until the knapsack is full or you’ve taken all of something.

Complexity: O(n log n) - sort by ratio once, then fill greedily in a single pass.

graph LR subgraph items["Items sorted by value/weight ratio"] direction LR I1["Gold dust\n10 $/kg\n4 kg"] I2["Silver\n6 $/kg\n3 kg"] I3["Bronze\n4 $/kg\n5 kg"] end subgraph knapsack["Knapsack (capacity 6 kg)"] direction LR T1["Gold: 4 kg\n(all of it)"] T2["Silver: 2 kg\n(2/3 of it)"] end I1 -->|"take all"| T1 I2 -->|"take fraction"| T2 I3 -->|"no room"| X["skipped"]

This is correct. Here’s why: if you could replace any portion of your current selection with something having a higher value-per-weight ratio, you should. Taking items in decreasing value-per-weight order ensures you can’t improve by swapping.

0/1 knapsack: you must take an item completely or not at all. Greedy fails here. Consider: weight limit 10, items (weight 6, value 8) and (weight 5, value 7) and (weight 5, value 6). Greedy by value/weight picks the weight-6 item first (ratio 8/6 ≈ 1.33), then can’t fit both remaining items. Total value: 8. But taking both weight-5 items gives value 13. Greedy was wrong.

graph TD subgraph greedy["Greedy choice (wrong)"] direction LR G1["Item A\nw=6, v=8\nratio≈1.33\n✓ take it"] G2["Item B\nw=5, v=7\nratio=1.40\n✗ no room"] G3["Item C\nw=5, v=6\nratio=1.20\n✗ no room"] G1 --> GR["Total value: 8\n(4 kg wasted)"] end subgraph optimal["Optimal (DP finds it)"] direction LR O1["Item A\nw=6, v=8\n✗ skip"] O2["Item B\nw=5, v=7\n✓ take it"] O3["Item C\nw=5, v=6\n✓ take it"] O2 --> OR["Total value: 13\n(weight limit met)"] O3 --> OR end

The 0/1 constraint breaks the greedy choice property. Taking one item can rule out a better combination. You need dynamic programming, which we’ll cover in the next post. The DP solution runs in O(nW) (pseudopolynomial in the weight capacity W).

Huffman Coding: Greedy Compression

You’re compressing a file. The file contains characters with known frequencies. You want to assign binary codes to each character so that the total encoded length is minimized.

The key insight: more frequent characters should get shorter codes. A character that appears 40% of the time should get a 1-bit code. One that appears 1% of the time can tolerate a longer code.

Huffman’s algorithm builds an optimal variable-length code greedily.

Create a leaf node for each character with its frequency.
Repeatedly: take the two nodes with the lowest frequencies, create a new internal node with their combined frequency as its parent, with those two as children.
Repeat until one node remains - this is the root of the Huffman tree.
Assign 0 to left branches and 1 to right branches. A character’s code is its path from root to leaf.

Complexity: O(n log n) - each of the n-1 merge steps extracts two minimums and inserts one node into a min-heap, each costing O(log n).

The greedy choice: always merge the two lowest-frequency nodes first. Here is a worked example with four characters:

graph TB subgraph step1["Step 1: merge C(1) and B(2)"] S1R["3"] --> S1C["C: 1"] S1R --> S1B["B: 2"] end subgraph step2["Step 2: merge node(3) and D(3)"] S2R["6"] --> S2D["D: 3"] S2R --> S2N["3"] S2N --> S2C["C: 1"] S2N --> S2B["B: 2"] end subgraph step3["Final tree: merge A(5) and node(6)"] ROOT["root: 11"] --> A["A: 5\ncode: 0"] ROOT --> N6["6"] N6 --> D["D: 3\ncode: 10"] N6 --> N3["3"] N3 --> C["C: 1\ncode: 110"] N3 --> B["B: 2\ncode: 111"] end

A gets the shortest code (most frequent). C and B get the longest codes (least frequent). The two characters with the lowest frequencies must be siblings in any optimal tree - they both get the longest codes, and merging them first is always safe.

Huffman coding is used in the DEFLATE compression format (which is inside ZIP, PNG, and HTTP gzip), JPEG, and MP3.

Kruskal’s MST: Greedy on Edges

A minimum spanning tree (MST) of a graph is a subset of edges that connects all vertices with minimum total weight, using no cycles.

Kruskal’s algorithm: sort all edges by weight. Add the next cheapest edge if and only if it doesn’t create a cycle in the current set of edges. Stop when the spanning tree is complete.

Complexity: O(E log E) - dominated by sorting edges. The cycle-detection via Union-Find adds nearly O(E) after that.

graph LR A ---|"1 ✓"| B C ---|"2 ✓"| D A ---|"3 ✓"| C B -.-|"4 ✗ cycle"| D B -.-|"5 ✗ cycle"| C D ---|"6 ✓"| E

Edges are processed in weight order: (A-B, 1) added, (C-D, 2) added, (A-C, 3) added - connecting the two components, (B-D, 4) skipped - would create a cycle through A-B-D-C-A, (B-C, 5) skipped - same reason, (D-E, 6) added. The MST spans all 5 vertices using 4 edges with total weight 12.

The correctness relies on the cut property: for any cut of the graph (a partition of vertices into two sets), the minimum-weight edge crossing the cut is in some MST. This is the greedy choice property: the cheapest edge that doesn’t create a cycle is always “crossing some cut,” and therefore safe to add.

The cycle detection uses a Union-Find data structure. We maintain connected components - initially each vertex is its own component. Adding an edge $u$-$v$ is safe if $u$ and $v$ are in different components. If they’re in the same component, the edge would create a cycle.

Dijkstra as a Greedy Algorithm

Dijkstra’s shortest-path algorithm is secretly greedy. It maintains a set of “finalized” vertices whose shortest path from the source is known. At each step, it greedily picks the unfinalized vertex with the currently smallest estimated distance, finalizes it, and updates its neighbors.

Complexity: O((V+E) log V) with a binary heap - each vertex is finalized once (V extractions at O(log V) each) and each edge triggers at most one heap update (E updates at O(log V) each).

The greedy choice property here is: when we finalize a vertex $v$ with distance $d$, no future path can produce a shorter route to $v$ (assuming all edge weights are non-negative). This is why Dijkstra fails with negative edge weights - the greedy choice property breaks down.

We’ll cover Dijkstra in full in the shortest paths post.

The Exchange Argument: Your Main Proof Tool

Here’s the general template for proving a greedy algorithm correct.

Suppose the greedy algorithm’s first choice is $g$, and consider any optimal solution $OPT$ that doesn’t include $g$.
$OPT$ must make some different first choice $a$ at the same step.
Construct a modified solution: take $OPT$, swap out $a$, swap in $g$.
Show the modified solution is still feasible (no constraints violated).
Show the modified solution is at least as good as $OPT$ (value doesn’t decrease).
Conclude: some optimal solution includes $g$. Apply the same argument to the remaining subproblem.

The exchange argument is sometimes called the “greedy stays ahead” proof. The key step is always step 5: showing that the greedy choice doesn’t hurt the solution quality. This is what fails for the 0/1 knapsack and for the TSP nearest-neighbor heuristic.

Discomfort check. “How do I know when greedy works?” The honest answer is: you need to prove it. Greedy algorithms that appear correct often aren’t - the TSP nearest-neighbor heuristic is intuitive and wrong. The discipline is to first design the greedy strategy, then attempt the exchange argument. If the exchange argument goes through, you have a proof. If it breaks down somewhere - if swapping in the greedy choice sometimes makes things worse - then you’ve found a counterexample, and greedy doesn’t work. When greedy fails, you typically need dynamic programming or more sophisticated techniques.

Summary

Problem	Greedy strategy	Correct?	Complexity	Why
Activity selection	Earliest finish time	Yes	O(n log n)	Exchange argument: earliest-finish always safe
Fractional knapsack	Highest value/weight first	Yes	O(n log n)	Fractions allow perfect adjustment
0/1 knapsack	Highest value/weight first	No	O(nW) via DP	Integer constraint breaks exchange
Huffman coding	Merge lowest-frequency pair	Yes	O(n log n)	Optimal-substructure + exchange argument
Kruskal MST	Cheapest edge without cycle	Yes	O(E log E)	Cut property guarantees correctness
Dijkstra shortest path	Minimum distance unfinalized	Yes (non-negative weights only)	O((V+E) log V)	Greedy choice property holds without negatives
Traveling salesman	Nearest unvisited city	No	-	Early choices lock in suboptimal later ones

Read next:

Dynamic Programming - Never Solve the Same Problem Twice