Prerequisite: Data Structures: Arrays to Queues


Once you have a handle on linear data structures, the next step is trees - data structures that model hierarchy. File systems, HTML documents, organisation charts, and decision trees all share this shape: one root, branching into children, branching further into their children. Trees unlock a new class of efficient algorithms.

Trees Are Everywhere

  • File systems: a directory contains files and subdirectories, each of which may contain more.
  • HTML DOM: <html> is the root; <body> and <head> are children; paragraphs and divs branch further.
  • Expression trees: (3 + 4) * 2 becomes a tree with * at the root, 2 as one child, and + (with 3 and 4) as the other.
  • Machine learning decision trees: each internal node is a feature test; leaves are predictions.

Terminology: the topmost node is the root; nodes with no children are leaves; the depth of a node is how many edges separate it from the root.

Binary Trees

A binary tree restricts each node to at most two children, called left and right. Here is a minimal Python node:

class TreeNode:
    def __init__(self, val: int):
        self.val = val
        self.left: "TreeNode | None" = None
        self.right: "TreeNode | None" = None

Binary Search Trees

A binary search tree (BST) adds a constraint: for every node, all values in its left subtree are smaller, and all values in its right subtree are larger. This makes search, insert, and delete $O(\log n)$ when the tree is balanced.

class BST:
    def __init__(self):
        self.root: TreeNode | None = None

    def insert(self, val: int) -> None:
        if not self.root:
            self.root = TreeNode(val)
        else:
            self._insert(self.root, val)

    def _insert(self, node: TreeNode, val: int) -> None:
        if val < node.val:
            if node.left is None:
                node.left = TreeNode(val)
            else:
                self._insert(node.left, val)
        else:
            if node.right is None:
                node.right = TreeNode(val)
            else:
                self._insert(node.right, val)

    def search(self, val: int) -> bool:
        node = self.root
        while node:
            if val == node.val:
                return True
            node = node.left if val < node.val else node.right
        return False

The $O(\log n)$ guarantee only holds when the tree stays balanced. An unbalanced BST (imagine inserting already-sorted values) degrades to $O(n)$ - essentially a linked list. Self-balancing trees like AVL and Red-Black trees fix this automatically; Python’s standard library does not include them, but sortedcontainers.SortedList is a popular third-party option.

Tree Traversal

There are three classic recursive traversal orders. Each visits every node exactly once in $O(n)$:

def inorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return inorder(node.left) + [node.val] + inorder(node.right)

def preorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return [node.val] + preorder(node.left) + preorder(node.right)

def postorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return postorder(node.left) + postorder(node.right) + [node.val]
  • Inorder (left → node → right): produces sorted output for a BST.
  • Preorder (node → left → right): useful for copying a tree or serialising it.
  • Postorder (left → right → node): useful for deleting a tree (children before parent).

Heaps

A heap is a special binary tree with two properties:

  1. Complete: every level is fully filled except possibly the last, which fills left to right.
  2. Heap property: in a min-heap, every parent is smaller than or equal to its children. The root is always the minimum.

Because of the complete property, heaps are stored efficiently as plain arrays - no pointers needed. Python’s heapq module gives you a min-heap on top of a list:

import heapq

heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)

print(heapq.heappop(heap))   # 1 - always the minimum
print(heapq.heappop(heap))   # 3

Both push and pop are $O(\log n)$. Heaps power priority queues, heapsort, and graph algorithms like Dijkstra’s.

Tries

A trie (prefix tree) stores strings by sharing common prefixes. Each node represents one character; paths from root to leaf spell out words.

class TrieNode:
    def __init__(self):
        self.children: dict[str, "TrieNode"] = {}
        self.is_end: bool = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def starts_with(self, prefix: str) -> bool:
        node = self.root
        for ch in prefix:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return True

Tries are ideal for autocomplete, spell checking, and IP routing tables.

When to Use Trees vs Other Structures

Scenario Best choice
Ordered data, fast search BST or sortedcontainers.SortedList
Always need the min/max quickly heapq (heap)
Prefix matching, autocomplete Trie
Hierarchical relationships Tree
Unordered, fast lookup by key dict (hash map)
Sequence with index access list

Examples

Task Scheduler with a Heap

Schedule tasks by priority. Lower number means higher priority.

import heapq

class TaskScheduler:
    def __init__(self):
        self._queue: list[tuple[int, str]] = []

    def add_task(self, priority: int, name: str) -> None:
        heapq.heappush(self._queue, (priority, name))

    def run_next(self) -> str | None:
        if not self._queue:
            return None
        _, name = heapq.heappop(self._queue)
        return name

scheduler = TaskScheduler()
scheduler.add_task(3, "write report")
scheduler.add_task(1, "fix production bug")
scheduler.add_task(2, "review PR")

print(scheduler.run_next())   # "fix production bug"
print(scheduler.run_next())   # "review PR"

Word Autocomplete with Trie

def autocomplete(trie: Trie, prefix: str, node=None, results=None) -> list[str]:
    if results is None:
        results = []
        node = trie.root
        for ch in prefix:
            if ch not in node.children:
                return []
            node = node.children[ch]

    if node.is_end:
        results.append(prefix)

    for ch, child in node.children.items():
        autocomplete(trie, prefix + ch, child, results)

    return results

t = Trie()
for word in ["apple", "app", "application", "apply", "banana"]:
    t.insert(word)

print(autocomplete(t, "app"))   # ['app', 'apple', 'application', 'apply']

Trees reward you whenever your data has natural hierarchy or ordering. BSTs give you fast ordered search, heaps give you instant access to the minimum, and tries make prefix queries cheap. Knowing which variant to reach for - and when a hash map or list is actually simpler - is the practical skill.


Read Next: Graph Algorithms: Traversal