Data Structures - Storage Is Strategy // Megha Bose

Helpful context:

Big-O - Why the Right Algorithm Beats Faster Hardware

In 1969, the Apollo 11 guidance computer had 4 kilobytes of RAM. Every decision about where to store data was consequential - engineers argued over individual bytes. A wrong choice could mean a program too slow to navigate a lunar approach.

Today your phone has roughly 6 billion times more memory. And yet the question hasn’t gone away. Choose the wrong data structure and an algorithm that should run in seconds will take minutes. A program that should handle millions of users will grind to a halt at ten thousand. The hardware got faster. The tradeoffs stayed the same.

An algorithm is only a sequence of operations - comparisons, insertions, lookups, deletions. How long each of those operations takes depends entirely on how the data is stored and accessed, which is what the data structure determines. The same algorithm can be fast or slow depending purely on the structure underneath it.

Arrays: Memory in a Line

An array stores elements in a contiguous block of memory. Every element sits right next to the previous one, with no gaps.

This one fact gives you $O(1)$ random access. If you want the element at index $i$, the computer computes its address directly:

$$\text{address}(i) = \text{base} + i \times \text{element size}$$

No searching. No traversal. One arithmetic operation and you’re there.

flowchart LR A["12"] --- B["5"] --- C["8"] --- D["3"] --- E["9"]

Contiguous memory also means excellent cache behavior. Modern CPUs don’t load one byte at a time from memory - they load a whole “cache line” (typically 64 bytes). When you read element $i$, you automatically get elements $i+1$, $i+2$, and a few more for free, because they’re all in the same cache line. Sequential array traversal is one of the most cache-friendly things you can do on modern hardware.

The cost is rigidity. Inserting an element in the middle requires shifting every subsequent element one position right: $O(n)$ in the worst case. Same for deletion. If you need to insert and delete frequently, you’re paying for that rigidity every time.

nums = [12, 5, 8, 3, 9]

print(nums[2])        # O(1): 8 - direct address computation
nums.insert(2, 99)    # O(n): shifts elements 2..end right
nums.pop(0)           # O(n): shifts everything left
nums.append(7)        # O(1) amortized: add to end

Under the hood: Arrays are the primitive - they map directly to a contiguous region of raw memory. There is no simpler structure beneath them. Everything else in this post is built on top of arrays, either directly or indirectly.

Dynamic Arrays: Getting Amortized O(1) Appends

A fixed-size array has to know its size upfront. Real programs often don’t. The solution is a dynamic array - Python’s list, Java’s ArrayList, C++’s std::vector.

The idea is simple: wrap a fixed array, and when it fills up, allocate a new one twice as large and copy everything over.

The doubling strategy is the key. You might think: “occasionally this costs $O(n)$ to copy, so appends are $O(n)$ in the worst case.” That’s true per individual append - but it misses the picture.

Ask instead: how much total copying happens across $n$ appends? The copies happen at sizes $1, 2, 4, 8, \cdots, n/2$. The total copying is:

$$1 + 2 + 4 + \cdots + \frac{n}{2} = n - 1 < n$$

So $n$ appends causes less than $n$ total copy operations, on top of $n$ regular insertions. Total work: $O(n)$ for $n$ appends. Average work per append: $O(1)$. We call this $O(1)$ amortized - the occasional expensive operation gets spread across the many cheap ones. Think of it like a bank account: one large deposit doesn’t mean every transaction is expensive. (We’ll formalize this in Amortized Analysis - The True Cost of a Sequence .)

Under the hood: A dynamic array wraps a fixed-size array. Internally it stores three things: a pointer to that underlying array, the current number of elements, and the allocated capacity. When capacity is exceeded, a new array is allocated, everything is copied over, and the old array is freed.

Linked Lists: Flexibility at a Cost

A linked list stores each element in a node containing two things: the value, and a pointer to the next node. The nodes can be anywhere in memory - they’re connected by pointers, not by physical proximity.

This gives you something arrays can’t easily do: $O(1)$ insertion and deletion at a known position. To insert between nodes A and B, you create a new node and rewire two pointers. No shifting required.

But you pay for this flexibility in two ways.

Pointer chasing. To find element at index $i$, you start at the head and follow pointers one by one. That’s $O(n)$ for random access. There’s no shortcut - you can’t compute the address of an element from its index.

Cache misses. This is the hidden cost that Big-O doesn’t show you. In an array, adjacent elements are adjacent in memory. In a linked list, the node you want could be anywhere on the heap - the pointer takes you to a completely different part of memory. Every such jump likely causes a cache miss: the CPU has to wait for data to arrive from RAM, which can cost 100-300 clock cycles instead of 1-4. For traversal-heavy workloads, linked lists can be 10-50 times slower than arrays in practice, even when both are $O(n)$ in theory.

There is a subtle limitation with a singly linked list: even if you have a pointer directly to a node you want to delete, you still need to update the previous node’s next pointer - and to find that previous node, you have to traverse from the head. That’s $O(n)$. A doubly linked list adds a prev pointer to each node, so given a pointer to any node you can reach its predecessor instantly and delete in $O(1)$.

flowchart LR A["12"] -->|next| B["5"] -->|next| C["8"] -->|next| D["3"] -->|next| N["∅"]

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

    def prepend(self, val):       # O(1): just update head pointer
        node = Node(val)
        node.next = self.head
        self.head = node

    def find(self, val):          # O(n): must walk the chain
        cur = self.head
        while cur:
            if cur.val == val:
                return cur
            cur = cur.next
        return None

In Python, linked lists are rarely the right answer. Python nodes carry object overhead (reference counts, type pointer, GC fields) - a single integer node takes ~60 bytes, not 8. For the use case where you need $O(1)$ front insertion and deletion, reach for collections.deque instead. It is a doubly-linked list of fixed-size memory blocks, giving you the flexibility without the per-node allocation overhead.

Under the hood: Each node is a small struct with two fields - a value and a pointer (a memory address). The nodes are allocated individually anywhere on the heap. The “list” is nothing more than these structs chained together by following pointers. There is no array involved.

Cache Locality: Why Arrays Win in Practice

Despite linked lists' theoretical advantages in insertion and deletion, arrays dominate in practice for almost all workloads on modern hardware. The reason is cache locality.

When you access array[i], the CPU loads not just that element but an entire cache line - typically 64 bytes, or 8 to 16 adjacent elements. If you then access array[i+1], it is already in cache. The CPU’s hardware prefetcher also detects sequential access patterns and loads ahead. Array traversal is extraordinarily cache-friendly.

Linked list traversal follows a pointer to a node that could be anywhere in the virtual address space. Each node access is likely a cache miss. Even though both traversals are $O(n)$ in theory, on modern hardware iterating a million-element array is routinely 5-10x faster than iterating a million-element linked list. The constant factor hidden inside Big-O is not hiding a small difference - it is hiding an order-of-magnitude difference.

This is why production systems in Rust, C++, and Go push toward array-backed structures by default, and why you should reach for list or deque in Python before ever building a custom linked list.

Stacks and Queues: Disciplined Access

Some problems only need to access data in a specific order. Restricting access to enforce discipline is often a feature, not a limitation.

A stack gives you LIFO: last in, first out. You can only push to the top and pop from the top. Python lists work perfectly as stacks because .append() and .pop() are both $O(1)$:

stack = []
stack.append("first")
stack.append("second")
stack.append("third")

top = stack.pop()   # "third" - O(1)
print(stack)        # ["first", "second"]

The most important stack in your computer is the call stack. Every function call pushes a frame (local variables, return address); every return pops one. This is why infinite recursion raises RecursionError - it overflows the stack. Other places stacks appear: undo/redo, expression parsing, DFS traversal, browser history.

A queue gives you FIFO: first in, first out. Elements are added at the back and removed from the front. The critical mistake is using a plain Python list as a queue - list.pop(0) is $O(n)$ because Python shifts every element left. Use collections.deque:

from collections import deque

queue = deque()
queue.append("first")
queue.append("second")
queue.append("third")

front = queue.popleft()   # "first" - O(1)

Queues appear everywhere in real systems: event loops (Node.js and Python asyncio process events from a queue), message brokers (Kafka, RabbitMQ, SQS), print spoolers, and BFS traversal.

def bfs(graph: dict, start: str) -> list:
    visited = set()
    queue = deque([start])
    order = []
    while queue:
        node = queue.popleft()
        if node in visited:
            continue
        visited.add(node)
        order.append(node)
        for neighbor in graph.get(node, []):
            queue.append(neighbor)
    return order

graph = {"A": ["B", "C"], "B": ["D"], "C": ["D"], "D": []}
print(bfs(graph, "A"))   # ['A', 'B', 'C', 'D']

The queue is what makes BFS traverse level by level rather than depth-first. The structure determines the algorithm’s behavior.

collections.deque supports $O(1)$ operations on both ends. This makes it the right tool for any problem where you need efficient access to both the front and back - not just queues. The sliding window maximum is a classic example: given an array and a window size $k$, find the maximum in each window as it slides across. Naively $O(nk)$; with a deque, $O(n)$:

def sliding_window_max(nums: list[int], k: int) -> list[int]:
    dq = deque()   # stores indices, front always holds index of current max
    result = []
    for i, num in enumerate(nums):
        while dq and dq[0] < i - k + 1:   # remove indices outside window
            dq.popleft()
        while dq and nums[dq[-1]] < num:   # remove smaller elements
            dq.pop()
        dq.append(i)
        if i >= k - 1:
            result.append(nums[dq[0]])
    return result

The balanced parentheses problem is the canonical stack application - it appears in almost every compiler and interview:

def is_balanced(s: str) -> bool:
    stack = []
    pairs = {")": "(", "]": "[", "}": "{"}
    for char in s:
        if char in "([{":
            stack.append(char)
        elif char in ")]}":
            if not stack or stack[-1] != pairs[char]:
                return False
            stack.pop()
    return len(stack) == 0

print(is_balanced("({[]})"))   # True
print(is_balanced("({[)"))     # False

Under the hood: A stack is an array with a single integer tracking the top index. A queue is typically a circular array with two integers (front and back) that wrap around when they reach the end, avoiding the need to shift elements. collections.deque is implemented in C as a doubly-linked list of fixed-size memory blocks - it avoids per-node allocation overhead while still getting $O(1)$ on both ends.

Hash Tables: O(1) by Design

A hash table solves the problem of finding things by key in expected $O(1)$ time - for both lookup and insertion. This is one of the most used guarantees in practical programming.

The idea: instead of storing elements at sequential indices, use a hash function to compute which bucket to put each key in.

$$\text{bucket}(k) = h(k) \mod m$$

where $m$ is the number of buckets. If the hash function distributes keys uniformly, each bucket will have roughly $n/m$ elements (the load factor $\alpha = n/m$). If $\alpha$ stays small (say, below 0.7), each lookup scans only a constant number of elements - hence $O(1)$ expected.

Collisions are inevitable: two different keys might hash to the same bucket. There are two main ways to handle them.

Chaining: each bucket holds a linked list of all keys that hashed there. Lookup just scans the list. When $\alpha$ is small, the lists are short and lookup is fast.

Open addressing: all elements live in the table itself. On a collision, probe to the next slot (linearly, or using a second hash function). This is more cache-friendly than chaining because everything is in one array, but it degrades faster as the table fills up.

Discomfort check. “Hash tables are $O(1)$ but I’ve seen them take $O(n)$ in benchmarks.” Both are true. The $O(1)$ is expected, assuming a good hash function and bounded load factor. Two things can blow it up: a bad hash function that puts most keys in the same bucket (making every lookup scan a long list), or a high load factor (too many keys for the number of buckets). A well-implemented hash table keeps load factor below a threshold and rehashes when that threshold is crossed. Also worth knowing: hash table iteration order is undefined in most implementations. Do not rely on it.

When the load factor exceeds the threshold, the table is resized - a new, larger table is allocated and all elements are rehashed. This costs $O(n)$ but happens rarely enough that amortized insertion stays $O(1)$.

Under the hood: An array of buckets. In chaining, each bucket is a linked list of all keys that hash there. In open addressing, it’s a flat array and the probing strategy decides which slot to try next. Either way, it reduces to arrays and (optionally) linked lists.

Priority Queues and Heaps

A priority queue gives you two operations: insert an element with a priority, and extract the element with the highest (or lowest) priority. Both in $O(\log n)$.

Before going further - a tree is a collection of nodes where each node holds a value and pointers to its child nodes. One node is designated the root (the top). There are no cycles - you can only move downward. A binary tree is one where each node has at most two children, called left and right. Nodes with no children are called leaves.

The standard implementation is a binary heap: a complete binary tree where every parent is larger than its children (for a max-heap).

flowchart TD A["9"] --> B["7"] & C["8"] B --> D["4"] & E["6"] C --> F["5"]

Under the hood: A binary heap is stored as a flat array - no tree nodes, no pointers. The root sits at index $0$. For any node at index $i$, its children are at $2i+1$ and $2i+2$, and its parent is at $\lfloor(i-1)/2\rfloor$. The tree structure is entirely implicit in the index arithmetic.

We’ll cover heaps in full in the Trees & Heaps - When Flat Structures Hit Their Limit post. For now, the key point: priority queues are not queues (despite the name). They don’t give you FIFO - they give you highest-priority-first. This makes them essential for algorithms like Dijkstra’s shortest path and Huffman coding.

Discomfort check. “Do arrays in modern languages actually work this way?” It depends. In C, C++, Rust, and Go - yes, an array is literally a contiguous block of memory exactly as described. NumPy arrays too, which is why they’re fast. But Python’s list is a dynamic array of pointers - the pointers are contiguous, but the actual objects live scattered on the heap, so there’s an extra indirection on every access. This is also why a Python list can hold mixed types. JavaScript arrays are stranger still: V8 internally uses contiguous storage when your array holds a single type, but silently degrades to a hash map if you mix types or create sparse arrays. Java splits the difference - int[] is genuinely contiguous raw memory, but Object[] holds references, not values. The model described in this post is what everything is built on at some level, and the cache behavior argument still holds - even Python’s pointer array benefits from cache lines. But in high-level languages, you’re often one layer of indirection away from the raw memory.

Choosing the Right Structure

Every data structure makes tradeoffs. The right choice depends on what operations you need most.

Think through these questions:

Do you need random access by index? Use an array.
Do you need fast insert/delete at arbitrary positions? Use a linked list (if you have a pointer to the position).
Do you need fast lookup by key? Use a hash table.
Do you need to always access the “best” element? Use a priority queue.
Do you need ordered iteration or range queries? Use a balanced BST - a tree where each node’s left subtree holds smaller values and right subtree holds larger ones, kept roughly equal in height so no operation takes more than $O(\log n)$.
Do you need LIFO discipline? Stack. FIFO? Queue. Both ends? Deque.

The worst situation is choosing based on intuition alone. Think about access patterns first.

Summary

Structure	Insert	Delete	Lookup	Notes
Array	$O(n)$ middle, $O(1)$ end	$O(n)$ middle, $O(1)$ end	$O(1)$ by index	Best cache behavior
Dynamic array	$O(1)$ amortized end	$O(n)$ middle	$O(1)$ by index	Preferred default sequence
Linked list	$O(1)$ at pointer	$O(1)$ at pointer	$O(n)$ by index	Poor cache behavior
Stack	$O(1)$ push	$O(1)$ pop	$O(1)$ peek	LIFO only
Queue	$O(1)$ enqueue	$O(1)$ dequeue	$O(1)$ peek	FIFO only; use `deque` in Python
Deque	$O(1)$ either end	$O(1)$ either end	$O(1)$ peek	Both ends; sliding window, BFS
Hash table	$O(1)$ expected	$O(1)$ expected	$O(1)$ expected	Unordered; worst case $O(n)$
Priority queue	$O(\log n)$	$O(\log n)$ extract-max	$O(1)$ peek-max	Backed by heap
Balanced BST	$O(\log n)$	$O(\log n)$	$O(\log n)$	Ordered; supports range queries

Every algorithm is only as fast as the data structure beneath it. You can have the most elegant algorithm in the world and still produce a slow program if the underlying structure makes the wrong operation cheap.

Read next:

Sorting Algorithms - How Order Turns Hard Problems Easy