Helpful context:


Every file system on your computer is a tree. Every DOM in a web browser is a tree. The decision trees inside chess engines are trees. Phylogenetic trees in biology show evolutionary relationships. Organizational hierarchies are trees. When you encounter (3 + 4) * 2 in code, the compiler builds a tree before evaluating it.

Trees are not an abstract data structure invented by computer scientists. They are the shape that hierarchical data naturally takes. When things have parents and children, a tree is what you get.


What Is a Tree?

A tree is a connected graph with no cycles and a designated root node. Every node except the root has exactly one parent. Every node can have zero or more children. Nodes with no children are called leaves.

class TreeNode:
    def __init__(self, val: int):
        self.val = val
        self.left: "TreeNode | None" = None
        self.right: "TreeNode | None" = None

A binary tree restricts each node to at most two children, called left and right.

A few vocabulary items you’ll encounter constantly:

  • Height of a node: the length of the longest path from that node down to a leaf. The height of a leaf is 0. The height of the tree is the height of the root.
  • Depth of a node: the length of the path from the root down to that node. The root has depth 0.
  • Complete binary tree: all levels are fully filled except possibly the last, which is filled left to right.
  • Full binary tree: every node has either 0 or 2 children.
  • Perfect binary tree: all interior nodes have 2 children and all leaves are at the same depth. Has exactly $2^{h+1} - 1$ nodes at height $h$.
flowchart TD A["root\n(depth 0)"] --> B["(depth 1)"] & C["(depth 1)"] B --> D["leaf\n(depth 2)"] & E["leaf\n(depth 2)"] C --> F["leaf\n(depth 2)"]

A binary tree with $n$ nodes has height at least $\log_2 n$. This logarithmic relationship is the source of $O(\log n)$ performance for tree operations - the height is the number of decisions you need to make to traverse from root to any leaf.


Binary Search Trees: O(log n) with a Condition

A binary search tree (BST) adds one invariant: for every node, all keys in the left subtree are smaller, and all keys in the right subtree are larger. This simple rule transforms search from $O(n)$ to $O(\log n)$.

flowchart TD A["8"] --> B["3"] & C["12"] B --> D["1"] & E["6"] C --> F["10"] & G["15"]
class BST:
    def __init__(self):
        self.root: TreeNode | None = None

    def insert(self, val: int) -> None:
        if not self.root:
            self.root = TreeNode(val)
        else:
            self._insert(self.root, val)

    def _insert(self, node: TreeNode, val: int) -> None:
        if val < node.val:
            if node.left is None:
                node.left = TreeNode(val)
            else:
                self._insert(node.left, val)
        else:
            if node.right is None:
                node.right = TreeNode(val)
            else:
                self._insert(node.right, val)

    def search(self, val: int) -> bool:
        node = self.root
        while node:
            if val == node.val:
                return True
            node = node.left if val < node.val else node.right
        return False

The search is elegant: at each node, compare the target against the current value. Go left if smaller, right if larger. Each comparison eliminates the entire other subtree. This is binary search, except the structure enforces the ordering rather than requiring a pre-sorted array.

Deletion has three cases:

  • Leaf node: remove it directly.
  • One child: splice it out by linking its parent to its child.
  • Two children: find the in-order successor (the leftmost node in the right subtree). Copy that key into the current node and delete the in-order successor, which has at most one child.

The dark secret of BSTs. The $O(\log n)$ guarantee only holds when the tree is balanced. Feed a BST elements in sorted order and it degenerates into a linked list:

bst = BST()
for val in [1, 2, 3, 4, 5, 6, 7]:   # sorted insertion
    bst.insert(val)
# Result: a chain 1 → 2 → 3 → 4 → 5 → 6 → 7
# Search is now O(n), not O(log n)
flowchart TD A["1"] --> B["2"] B --> C["3"] C --> D["4"] D --> E["5"]

This is not a rare edge case - any nearly-sorted input triggers significant degradation.


Tree Traversal: Three Ways to Visit Every Node

Three recursive orderings dominate:

  • In-order (left → node → right): visits BST nodes in sorted order. If you read off the keys during an in-order traversal of a BST, you get them smallest to largest.
  • Pre-order (node → left → right): visits the node before its subtrees. Used to copy or serialize a tree.
  • Post-order (left → right → node): visits children before parents. Used to evaluate expression trees and safe deletion (free children before the parent).
def inorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return inorder(node.left) + [node.val] + inorder(node.right)

def preorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return [node.val] + preorder(node.left) + preorder(node.right)

def postorder(node: TreeNode | None) -> list[int]:
    if node is None:
        return []
    return postorder(node.left) + postorder(node.right) + [node.val]

Compilers use postorder traversal of expression trees to generate code - evaluate operands first, then apply the operator. Build systems use a form of topological sort (essentially postorder on a dependency graph) to determine build order.

Recursion depth in practice. Recursive traversal hits Python’s recursion limit (default 1000) on deep trees. For production code on potentially deep trees, use iterative traversal with an explicit stack:

def inorder_iterative(root: TreeNode | None) -> list[int]:
    result, stack, current = [], [], root
    while current or stack:
        while current:
            stack.append(current)
            current = current.left
        current = stack.pop()
        result.append(current.val)
        current = current.right
    return result

AVL and Red-Black Trees: Keeping the Promise

The fix for BST degeneration is self-balancing: after each insert or delete, the tree restructures itself through rotations to maintain $O(\log n)$ height.

AVL trees maintain the invariant that for every node, the heights of its left and right subtrees differ by at most 1. After any insertion or deletion, walk back up the tree checking this invariant. Each node stores a balance factor - the height of its right subtree minus the height of its left subtree. A value outside ${-1, 0, +1}$ triggers a rotation.

There are four rotation cases (left-left, right-right, left-right, right-left). Here is a right rotation fixing a left-left imbalance:

flowchart LR subgraph after["After rotation"] B2["B\nbf = 0"] --> A2["A\nbf = 0"] & C2["C\nbf = 0"] end subgraph before["Before (violation at C)"] C1["C\nbf = -2"] --> B1["B\nbf = -1"] B1 --> A1["A\nbf = 0"] end

A single or double rotation is always enough to fix a violation, and at most $O(\log n)$ nodes need checking after an insertion.

Seeing rotations interactively makes them far clearer. Try inserting a few values into this AVL tree visualizer and watch the rotations fire.

Red-black trees use a weaker invariant: each node is colored red or black, no two consecutive red nodes on any path, and every path from root to null has the same number of black nodes. This limits the tree to height at most $2\log_2(n+1)$.

flowchart TD A["7"]:::black --> B["3"]:::red & C["18"]:::red B --> D["1"]:::black & E["5"]:::black C --> F["10"]:::black & G["22"]:::black classDef black fill:#222,color:#fff,stroke:#444 classDef red fill:#c0392b,color:#fff,stroke:#c0392b

Red-black trees require fewer rotations per insertion than AVL trees, making them faster in write-heavy workloads. They are used in C++’s std::map, Java’s TreeMap, and the Linux kernel’s process scheduler.

You do not need to memorize the rotation cases. What matters is the guarantee: balanced BSTs give $O(\log n)$ for search, insert, and delete, always, regardless of insertion order.

In Python: the standard library has no balanced BST. sortedcontainers.SortedList is the practical answer - it gives $O(\log n)$ operations and preserves sorted order. For most applications, you reach for that rather than implementing rotations from scratch.


Heaps: Trees Stored in Arrays

A binary heap is a complete binary tree satisfying the heap property: every parent is greater than or equal to both its children (max-heap), or less than or equal (min-heap). The root is always the maximum (or minimum) element.

The array representation. Because heaps are complete binary trees, they can be stored in a plain array with no pointers. For a 1-indexed array:

  • Node $i$ has its parent at $\lfloor i/2 \rfloor$
  • Node $i$ has its left child at $2i$
  • Node $i$ has its right child at $2i + 1$
flowchart TD A["[1] 16"] --> B["[2] 14"] & C["[3] 10"] B --> D["[4] 8"] & E["[5] 7"] C --> F["[6] 9"] & G["[7] 3"]

No pointer traversal - just arithmetic. This is cache-friendly, memory-efficient, and requires no dynamic allocation. An array heap can be 3-5x faster than a pointer-based equivalent because all nodes are contiguous in memory.

Python’s heapq module implements a min-heap on a plain list:

import heapq

heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)

print(heapq.heappop(heap))   # 1 - always the minimum
print(heapq.heappop(heap))   # 3

# Priority queue example
class TaskScheduler:
    def __init__(self):
        self._queue: list[tuple[int, str]] = []

    def add_task(self, priority: int, name: str) -> None:
        heapq.heappush(self._queue, (priority, name))

    def run_next(self) -> str | None:
        if not self._queue:
            return None
        _, name = heapq.heappop(self._queue)
        return name

Heap Operations

Insert: place the new element at the end of the array, then sift it upward until the heap property is restored. Time: $O(\log n)$.

Extract-max/min: the extremum is always at the root (index 1). Remove it, move the last element to the root, then sift it down. Time: $O(\log n)$.

Build-heap: given an arbitrary array, convert it to a valid heap. The naive approach is to insert each element: $O(n \log n)$. The smarter way: start at the last internal node (index $\lfloor n/2 \rfloor$) and sift down toward the root. This is $O(n)$.

Why $O(n)$? A node at height $h$ needs at most $h$ swaps when sifted down. There are roughly $n/2^{h+1}$ nodes at height $h$. Total cost:

$$\sum_{h=0}^{\lfloor \log n \rfloor} \frac{n}{2^{h+1}} \cdot h = \frac{n}{2} \sum_{h=0}^{\infty} \frac{h}{2^h} = \frac{n}{2} \cdot 2 = O(n)$$

using the identity $\sum_{h=0}^{\infty} h x^h = x/(1-x)^2$ at $x = 1/2$. Intuitively: most nodes are near the leaves where sifting is cheap. Only a few nodes near the root need to travel far, and those are scarce.

data = [5, 3, 8, 1, 9, 2, 7]
heapq.heapify(data)   # O(n) in-place
print(data[0])        # 1 - the minimum

Heaps in algorithms. Dijkstra’s shortest-path algorithm uses a min-heap as its priority queue - always extract the unvisited node with the smallest tentative distance. Without a heap, Dijkstra’s is $O(V^2)$; with a binary heap it is $O((V + E) \log V)$.


Heapsort

Build-heap converts the array to a max-heap in $O(n)$. Then repeatedly swap the root (current maximum) with the last element of the heap, shrink the heap by 1, and sift the new root down. After $n-1$ such extractions, the array is sorted.

Total: $O(n)$ for build-heap plus $n \cdot O(\log n)$ for extractions = $O(n \log n)$, in-place, no auxiliary memory. The sort is not stable, and it is slower than quicksort in practice due to poor cache behavior - the heap’s jumpy access pattern does not benefit from cache prefetching.


Expression Trees

When a compiler encounters (3 + 4) * 2, it builds an expression tree:

    *
   / \
  +   2
 / \
3   4

Postorder traversal evaluates this correctly: visit 3, visit 4, apply +, visit 2, apply *. The tree structure naturally encodes operator precedence - the shape of the tree dictates which operations apply first, eliminating the need for explicit parentheses once the tree is built. This intermediate representation is easy to evaluate, transform, and optimize, which is why compilers use it universally.


B-Trees

Binary search trees work well when data fits in RAM. But a billion records on disk changes the problem entirely. Disk reads are roughly 100,000 times slower than RAM reads. A balanced BST over a billion records has height around 30 - meaning 30 disk reads per lookup. Unacceptable.

The fix: make each node fat. A B-tree node holds up to $2t - 1$ keys and up to $2t$ children, where $t$ is the minimum degree. Size the node to fit exactly one disk block (typically 4KB or 16KB). With hundreds of keys per node, a B-tree over a billion records has height 3 or 4 - meaning 3-4 disk reads per lookup.

Every internal node has between $t-1$ and $2t-1$ keys, sorted. The $i$-th child subtree contains all keys between the $(i-1)$-th and $i$-th keys of the node - the BST property generalized to many children.

flowchart TD A["10 | 20 | 30"] --> B["3 | 7"] & C["13 | 17"] & D["23 | 27"] & E["35 | 40"]

Insertion always happens at a leaf. If the leaf is full ($2t-1$ keys), split it: push the median key up to the parent, divide the node into two. Splits can cascade up. When the root splits, the tree grows taller - the only way a B-tree gains height.

B+ trees are the variant used by almost every database. Internal nodes store only keys; all actual data lives in the leaf nodes, which are also linked in a sorted chain. This makes range queries fast: find the start key, then scan the leaf chain without going back up the tree.

class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t
        self.leaf = leaf
        self.keys = []
        self.children = []

class BTree:
    def __init__(self, t):
        self.t = t
        self.root = BTreeNode(t, leaf=True)

    def search(self, k, node=None):
        if node is None:
            node = self.root
        i = 0
        while i < len(node.keys) and k > node.keys[i]:
            i += 1
        if i < len(node.keys) and k == node.keys[i]:
            return True
        if node.leaf:
            return False
        return self.search(k, node.children[i])

    def insert(self, k):
        root = self.root
        if len(root.keys) == 2 * self.t - 1:
            new_root = BTreeNode(self.t)
            new_root.children.append(self.root)
            self._split_child(new_root, 0)
            self.root = new_root
        self._insert_non_full(self.root, k)

    def _insert_non_full(self, node, k):
        i = len(node.keys) - 1
        if node.leaf:
            node.keys.append(None)
            while i >= 0 and k < node.keys[i]:
                node.keys[i + 1] = node.keys[i]
                i -= 1
            node.keys[i + 1] = k
        else:
            while i >= 0 and k < node.keys[i]:
                i -= 1
            i += 1
            if len(node.children[i].keys) == 2 * self.t - 1:
                self._split_child(node, i)
                if k > node.keys[i]:
                    i += 1
            self._insert_non_full(node.children[i], k)

    def _split_child(self, parent, i):
        t = self.t
        full = parent.children[i]
        new_node = BTreeNode(t, leaf=full.leaf)
        mid = t - 1
        parent.keys.insert(i, full.keys[mid])
        parent.children.insert(i + 1, new_node)
        new_node.keys = full.keys[mid + 1:]
        full.keys = full.keys[:mid]
        if not full.leaf:
            new_node.children = full.children[t:]
            full.children = full.children[:t]

PostgreSQL, MySQL, and SQLite all use B+ trees for their primary indexes.


Tries

A trie (from “retrieval”) is a tree built for string lookups. Instead of storing whole strings at nodes, you store one character per edge. Every path from the root to a marked node spells out a stored string.

This gives $O(L)$ lookup for a string of length $L$, independent of how many strings are stored. A hash table also gives $O(L)$ average lookup (hashing takes $O(L)$), but a trie additionally supports: find all strings with a given prefix, find the longest prefix match, enumerate strings in lexicographic order.

flowchart TD R["root"] --> C["c"] & D["d"] C --> CA["a"] CA --> CAT["t ✓"] & CAR["r ✓"] & CAN["n ✓"] D --> DO["o"] DO --> DOG["g ✓"]

“cat”, “car”, and “can” all share the prefix “ca” - they follow the same path for the first two characters and only diverge at the third. This shared prefix storage makes tries memory-efficient for large collections with common prefixes (all words in a dictionary, all URLs on a site).

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                return False
            node = node.children[ch]
        return node.is_end

    def all_with_prefix(self, prefix):
        node = self.root
        for ch in prefix:
            if ch not in node.children:
                return []
            node = node.children[ch]
        results = []
        self._collect(node, prefix, results)
        return results

    def _collect(self, node, current, results):
        if node.is_end:
            results.append(current)
        for ch, child in node.children.items():
            self._collect(child, current + ch, results)

all_with_prefix powers autocomplete: given what the user has typed, return every word in the dictionary that starts with it. This is $O(L + k)$ where $L$ is the prefix length and $k$ is the number of results.


When Not to Use These Structures

BSTs in Python: for most applications, a balanced BST is overkill. If you need frequent lookups, a dict is faster. If you need a sorted sequence you rarely modify, sort once and use bisect for $O(\log n)$ binary search on a plain list. If you need sorted order with frequent insertion, sortedcontainers.SortedList is the right call.

Heaps vs sorted(): if you need the top $k$ elements from a list, heapq.nlargest(k, items) is simpler. If you need frequent “give me the minimum while also inserting” (the online/streaming case), a heap is exactly right. If you need arbitrary removal or full sorted iteration, a different structure is cleaner - heaps only efficiently support extracting the extremum.


Summary

Structure Search Insert Delete Notes
BST (unbalanced) $O(h)$ $O(h)$ $O(h)$ Degenerates on sorted input
AVL tree $O(\log n)$ $O(\log n)$ $O(\log n)$ Strict balance; more rotations
Red-black tree $O(\log n)$ $O(\log n)$ $O(\log n)$ Fewer rotations; used in practice
Max-heap $O(1)$ max, $O(n)$ arbitrary $O(\log n)$ $O(\log n)$ extract-max Build from array in $O(n)$
B-tree (order $t$) $O(\log_t n)$ disk reads $O(\log_t n)$ $O(\log_t n)$ Minimizes disk seeks; used in databases
Trie $O(L)$ $O(L)$ $O(L)$ $L$ = key length; prefix queries in $O(L+k)$

Read next: