Trees, Heaps, and Hierarchical Data
Prerequisite: Data Structures: Arrays to Queues
Once you have a handle on linear data structures, the next step is trees - data structures that model hierarchy. File systems, HTML documents, organisation charts, and decision trees all share this shape: one root, branching into children, branching further into their children. Trees unlock a new class of efficient algorithms.
Trees Are Everywhere
- File systems: a directory contains files and subdirectories, each of which may contain more.
- HTML DOM:
<html>is the root;<body>and<head>are children; paragraphs and divs branch further. - Expression trees:
(3 + 4) * 2becomes a tree with*at the root,2as one child, and+(with3and4) as the other. - Machine learning decision trees: each internal node is a feature test; leaves are predictions.
Terminology: the topmost node is the root; nodes with no children are leaves; the depth of a node is how many edges separate it from the root.
Binary Trees
A binary tree restricts each node to at most two children, called left and right. Here is a minimal Python node:
class TreeNode:
def __init__(self, val: int):
self.val = val
self.left: "TreeNode | None" = None
self.right: "TreeNode | None" = None
Binary Search Trees
A binary search tree (BST) adds a constraint: for every node, all values in its left subtree are smaller, and all values in its right subtree are larger. This makes search, insert, and delete $O(\log n)$ when the tree is balanced.
class BST:
def __init__(self):
self.root: TreeNode | None = None
def insert(self, val: int) -> None:
if not self.root:
self.root = TreeNode(val)
else:
self._insert(self.root, val)
def _insert(self, node: TreeNode, val: int) -> None:
if val < node.val:
if node.left is None:
node.left = TreeNode(val)
else:
self._insert(node.left, val)
else:
if node.right is None:
node.right = TreeNode(val)
else:
self._insert(node.right, val)
def search(self, val: int) -> bool:
node = self.root
while node:
if val == node.val:
return True
node = node.left if val < node.val else node.right
return False
The $O(\log n)$ guarantee only holds when the tree stays balanced. An unbalanced BST (imagine inserting already-sorted values) degrades to $O(n)$ - essentially a linked list. Self-balancing trees like AVL and Red-Black trees fix this automatically; Python’s standard library does not include them, but sortedcontainers.SortedList is a popular third-party option.
Tree Traversal
There are three classic recursive traversal orders. Each visits every node exactly once in $O(n)$:
def inorder(node: TreeNode | None) -> list[int]:
if node is None:
return []
return inorder(node.left) + [node.val] + inorder(node.right)
def preorder(node: TreeNode | None) -> list[int]:
if node is None:
return []
return [node.val] + preorder(node.left) + preorder(node.right)
def postorder(node: TreeNode | None) -> list[int]:
if node is None:
return []
return postorder(node.left) + postorder(node.right) + [node.val]
- Inorder (left → node → right): produces sorted output for a BST.
- Preorder (node → left → right): useful for copying a tree or serialising it.
- Postorder (left → right → node): useful for deleting a tree (children before parent).
Heaps
A heap is a special binary tree with two properties:
- Complete: every level is fully filled except possibly the last, which fills left to right.
- Heap property: in a min-heap, every parent is smaller than or equal to its children. The root is always the minimum.
Because of the complete property, heaps are stored efficiently as plain arrays - no pointers needed. Python’s heapq module gives you a min-heap on top of a list:
import heapq
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
print(heapq.heappop(heap)) # 1 - always the minimum
print(heapq.heappop(heap)) # 3
Both push and pop are $O(\log n)$. Heaps power priority queues, heapsort, and graph algorithms like Dijkstra’s.
Tries
A trie (prefix tree) stores strings by sharing common prefixes. Each node represents one character; paths from root to leaf spell out words.
class TrieNode:
def __init__(self):
self.children: dict[str, "TrieNode"] = {}
self.is_end: bool = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
def starts_with(self, prefix: str) -> bool:
node = self.root
for ch in prefix:
if ch not in node.children:
return False
node = node.children[ch]
return True
Tries are ideal for autocomplete, spell checking, and IP routing tables.
When to Use Trees vs Other Structures
| Scenario | Best choice |
|---|---|
| Ordered data, fast search | BST or sortedcontainers.SortedList |
| Always need the min/max quickly | heapq (heap) |
| Prefix matching, autocomplete | Trie |
| Hierarchical relationships | Tree |
| Unordered, fast lookup by key | dict (hash map) |
| Sequence with index access | list |
Examples
Task Scheduler with a Heap
Schedule tasks by priority. Lower number means higher priority.
import heapq
class TaskScheduler:
def __init__(self):
self._queue: list[tuple[int, str]] = []
def add_task(self, priority: int, name: str) -> None:
heapq.heappush(self._queue, (priority, name))
def run_next(self) -> str | None:
if not self._queue:
return None
_, name = heapq.heappop(self._queue)
return name
scheduler = TaskScheduler()
scheduler.add_task(3, "write report")
scheduler.add_task(1, "fix production bug")
scheduler.add_task(2, "review PR")
print(scheduler.run_next()) # "fix production bug"
print(scheduler.run_next()) # "review PR"
Word Autocomplete with Trie
def autocomplete(trie: Trie, prefix: str, node=None, results=None) -> list[str]:
if results is None:
results = []
node = trie.root
for ch in prefix:
if ch not in node.children:
return []
node = node.children[ch]
if node.is_end:
results.append(prefix)
for ch, child in node.children.items():
autocomplete(trie, prefix + ch, child, results)
return results
t = Trie()
for word in ["apple", "app", "application", "apply", "banana"]:
t.insert(word)
print(autocomplete(t, "app")) # ['app', 'apple', 'application', 'apply']
Trees reward you whenever your data has natural hierarchy or ordering. BSTs give you fast ordered search, heaps give you instant access to the minimum, and tries make prefix queries cheap. Knowing which variant to reach for - and when a hash map or list is actually simpler - is the practical skill.
Read Next: Graph Algorithms: Traversal