Memory Management
Prerequisite: How Computers Execute Programs
Memory bugs kill software. Use-after-free, double-free, and memory leaks account for roughly 70% of high-severity security vulnerabilities in C and C++ codebases - a figure Microsoft and Google have both cited from their own bug databases. Understanding how memory management actually works, across manual, managed, and ownership-based systems, is essential for writing correct and efficient code in any language.
Stack vs Heap
Every process has two primary regions for runtime memory: the stack and the heap.
Stack memory is automatic, LIFO (last-in, first-out), and fast. Each function call pushes a stack frame containing the return address, saved registers, and local variables. When the function returns, that memory is reclaimed instantly - the stack pointer simply moves back. Stack allocation is a single instruction (sub rsp, N). The downside: the stack is fixed-size (typically 1–8 MB) and cannot outlive the function that created it.
Heap memory is dynamic. You request a block of arbitrary size, use it for as long as needed, and release it when done. In C, that’s malloc and free. In C++, new and delete. In Python or Java, the runtime handles deallocation automatically.
void example() {
int x = 42; // stack: gone when function returns
int *p = malloc(100 * sizeof(int)); // heap: persists until free()
// ... use p ...
free(p); // manual release
}
Memory Allocators
malloc is not magic - it sits on top of OS system calls (brk, mmap on Linux) and manages a pool of memory. Common allocator strategies:
Free list: Maintain a linked list of free blocks. On allocation, find a suitable block (first-fit, best-fit, worst-fit); on free, return the block to the list and coalesce adjacent free blocks.
Slab allocator: Pre-allocate fixed-size blocks for common object sizes. Used in the Linux kernel - allocating a task_struct is $O(1)$ because the slab already has a ready slot. Eliminates fragmentation for uniform-size objects.
Buddy system: Memory is divided in powers of two. Allocating $2^k$ bytes splits a larger block in half repeatedly until the right size is found. Freeing coalesces adjacent “buddies.” Used in Linux’s page allocator.
Fragmentation
Two kinds of fragmentation waste memory:
External fragmentation: Free memory exists but is scattered in non-contiguous holes too small to satisfy a large request. A 1 MB heap might have 500 KB free, but spread across 1000 tiny gaps.
Internal fragmentation: An allocator rounds up your request (e.g., you asked for 17 bytes, got 32). The wasted space inside the allocation is internal fragmentation.
Compacting allocators (like moving GCs) can eliminate external fragmentation by relocating live objects, but require all pointers to be updated - which is why C cannot do this.
Garbage Collection
Managed languages handle deallocation automatically. Several strategies exist:
Reference counting (CPython, Swift): Each object stores a count of references pointing to it. When the count drops to zero, the object is freed immediately. Deterministic destruction - objects are freed exactly when their last reference disappears. Fatal flaw: cannot collect cycles (two objects pointing to each other with no external references will never reach count zero).
Mark-and-sweep: The GC pauses the program, marks all objects reachable from roots (global variables, stack), then sweeps through the heap freeing everything unmarked. Handles cycles. The pause (stop-the-world) is proportional to heap size.
Generational GC (JVM, CPython’s cyclic GC, Go): Observation: most objects die young. Split the heap into generations - young (nursery) and old (tenured). Collect the young generation frequently (minor GC, cheap), promote long-lived objects to old, and collect the old generation infrequently (major GC, expensive). JVM’s G1 collector further divides memory into regions and collects the highest-garbage regions first.
Python’s Memory Model
Every Python value is an object on the heap, even small integers. Python uses reference counting as its primary mechanism, with a cyclic garbage collector on top.
import gc
import sys
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (x + the getrefcount argument)
y = x # refcount → 3
del y # refcount → 2
del x # refcount → 1 (still referenced by Python's internal cache? no - hits 0, freed)
Python interns small integers (typically -5 to 256) and short strings, so a = 1; b = 1; a is b is True - they point to the same object.
Memory Leaks
In C/C++: forgetting to free allocated memory. The process’s RSS (resident set size) grows without bound. Valgrind’s memcheck tool detects leaks by tracking every allocation and checking whether it was freed at program exit.
valgrind --leak-check=full ./my_program
In Python: circular references without weak references. If two objects reference each other and you lose all external references, the reference counter never hits zero. Python’s cyclic GC handles this, but it only runs periodically.
import gc
class Node:
def __init__(self):
self.other = None
a = Node()
b = Node()
a.other = b
b.other = a # cycle: a → b → a
del a, b # refcounts don't hit 0; objects leak until cyclic GC runs
gc.collect() # force cyclic GC; now both are freed
print(gc.collect()) # returns number of objects freed
For tracking Python memory usage:
import tracemalloc
tracemalloc.start()
# ... your code ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:5]:
print(stat)
Virtual Memory
Each process sees a private virtual address space - typically 128 TB on x86-64 Linux. The OS, with hardware support from the MMU (memory management unit), maps virtual pages to physical frames. This enables:
- Isolation: Process A cannot read process B’s memory even if they share physical RAM.
- Overcommit: Allocate more virtual memory than physical RAM; the OS only maps pages on first access.
- Swap: Physical frames can be paged out to disk when RAM is full; the page is mapped back in on next access (a page fault).
A page fault occurs when a process accesses a virtual address with no current physical mapping. The kernel handles it: either maps a new frame (minor fault) or loads data from disk (major fault). Major faults cost ~10 ms - a factor of $10^6$ slower than an L1 cache hit.
Memory-mapped files (mmap) map a file directly into the address space. Reading or writing those addresses reads/writes the file, with the kernel handling paging. Used by databases (SQLite, LMDB) and for efficient IPC.
import mmap
with open("data.bin", "r+b") as f:
mm = mmap.mmap(f.fileno(), 0)
print(mm[0:10]) # read first 10 bytes
mm[0:5] = b"hello" # write directly to file
mm.close()
Examples
Circular reference and gc.collect():
import gc
gc.disable() # turn off automatic cyclic GC for demonstration
class Leaky:
def __init__(self, name):
self.name = name
self.partner = None
def __del__(self):
print(f"{self.name} freed")
x = Leaky("A")
y = Leaky("B")
x.partner = y
y.partner = x
del x, y # no output - cycle prevents __del__
print("before gc.collect")
collected = gc.collect()
print(f"after gc.collect: freed {collected} objects")
# Now you'll see "A freed" and "B freed"
Common memory leak in C:
// Leak: buffer allocated in loop, never freed
void process_requests(int n) {
for (int i = 0; i < n; i++) {
char *buf = malloc(4096);
handle_request(buf);
// forgot: free(buf); <-- every iteration leaks 4KB
}
}
valgrind --leak-check=full would report: definitely lost: N bytes in N blocks pointing exactly to the malloc call site.
The core tension in memory management is between control and safety. C gives you complete control and leaves you responsible for every byte. Python and Java take control away and give you safety in return. Rust threads the needle - the borrow checker enforces memory safety at compile time with zero runtime overhead.
Read Next: