Memory Management // Megha Bose

Prerequisite: How Computers Execute Programs

Memory bugs kill software. Use-after-free, double-free, and memory leaks account for roughly 70% of high-severity security vulnerabilities in C and C++ codebases - a figure Microsoft and Google have both cited from their own bug databases. Understanding how memory management actually works, across manual, managed, and ownership-based systems, is essential for writing correct and efficient code in any language.

Stack vs Heap

Every process has two primary regions for runtime memory: the stack and the heap.

Stack memory is automatic, LIFO (last-in, first-out), and fast. Each function call pushes a stack frame containing the return address, saved registers, and local variables. When the function returns, that memory is reclaimed instantly - the stack pointer simply moves back. Stack allocation is a single instruction (sub rsp, N). The downside: the stack is fixed-size (typically 1–8 MB) and cannot outlive the function that created it.

Heap memory is dynamic. You request a block of arbitrary size, use it for as long as needed, and release it when done. In C, that’s malloc and free. In C++, new and delete. In Python or Java, the runtime handles deallocation automatically.

void example() {
    int x = 42;           // stack: gone when function returns
    int *p = malloc(100 * sizeof(int));  // heap: persists until free()
    // ... use p ...
    free(p);              // manual release
}

Memory Allocators

malloc is not magic - it sits on top of OS system calls (brk, mmap on Linux) and manages a pool of memory. Common allocator strategies:

Free list: Maintain a linked list of free blocks. On allocation, find a suitable block (first-fit, best-fit, worst-fit); on free, return the block to the list and coalesce adjacent free blocks.

Slab allocator: Pre-allocate fixed-size blocks for common object sizes. Used in the Linux kernel - allocating a task_struct is $O(1)$ because the slab already has a ready slot. Eliminates fragmentation for uniform-size objects.

Buddy system: Memory is divided in powers of two. Allocating $2^k$ bytes splits a larger block in half repeatedly until the right size is found. Freeing coalesces adjacent “buddies.” Used in Linux’s page allocator.

Fragmentation

Two kinds of fragmentation waste memory:

External fragmentation: Free memory exists but is scattered in non-contiguous holes too small to satisfy a large request. A 1 MB heap might have 500 KB free, but spread across 1000 tiny gaps.

Internal fragmentation: An allocator rounds up your request (e.g., you asked for 17 bytes, got 32). The wasted space inside the allocation is internal fragmentation.

Compacting allocators (like moving GCs) can eliminate external fragmentation by relocating live objects, but require all pointers to be updated - which is why C cannot do this.

Garbage Collection

Managed languages handle deallocation automatically. Several strategies exist:

Reference counting (CPython, Swift): Each object stores a count of references pointing to it. When the count drops to zero, the object is freed immediately. Deterministic destruction - objects are freed exactly when their last reference disappears. Fatal flaw: cannot collect cycles (two objects pointing to each other with no external references will never reach count zero).

Mark-and-sweep: The GC pauses the program, marks all objects reachable from roots (global variables, stack), then sweeps through the heap freeing everything unmarked. Handles cycles. The pause (stop-the-world) is proportional to heap size.

Generational GC (JVM, CPython’s cyclic GC, Go): Observation: most objects die young. Split the heap into generations - young (nursery) and old (tenured). Collect the young generation frequently (minor GC, cheap), promote long-lived objects to old, and collect the old generation infrequently (major GC, expensive). JVM’s G1 collector further divides memory into regions and collects the highest-garbage regions first.

Python’s Memory Model

Every Python value is an object on the heap, even small integers. Python uses reference counting as its primary mechanism, with a cyclic garbage collector on top.

import gc
import sys

x = [1, 2, 3]
print(sys.getrefcount(x))  # 2 (x + the getrefcount argument)

y = x                       # refcount → 3
del y                       # refcount → 2
del x                       # refcount → 1 (still referenced by Python's internal cache? no - hits 0, freed)

Python interns small integers (typically -5 to 256) and short strings, so a = 1; b = 1; a is b is True - they point to the same object.

Memory Leaks

In C/C++: forgetting to free allocated memory. The process’s RSS (resident set size) grows without bound. Valgrind’s memcheck tool detects leaks by tracking every allocation and checking whether it was freed at program exit.

valgrind --leak-check=full ./my_program

In Python: circular references without weak references. If two objects reference each other and you lose all external references, the reference counter never hits zero. Python’s cyclic GC handles this, but it only runs periodically.

import gc

class Node:
    def __init__(self):
        self.other = None

a = Node()
b = Node()
a.other = b
b.other = a   # cycle: a → b → a

del a, b      # refcounts don't hit 0; objects leak until cyclic GC runs

gc.collect()  # force cyclic GC; now both are freed
print(gc.collect())  # returns number of objects freed

For tracking Python memory usage:

import tracemalloc
tracemalloc.start()

# ... your code ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:5]:
    print(stat)

Virtual Memory

Each process sees a private virtual address space - typically 128 TB on x86-64 Linux. The OS, with hardware support from the MMU (memory management unit), maps virtual pages to physical frames. This enables:

Isolation: Process A cannot read process B’s memory even if they share physical RAM.
Overcommit: Allocate more virtual memory than physical RAM; the OS only maps pages on first access.
Swap: Physical frames can be paged out to disk when RAM is full; the page is mapped back in on next access (a page fault).

A page fault occurs when a process accesses a virtual address with no current physical mapping. The kernel handles it: either maps a new frame (minor fault) or loads data from disk (major fault). Major faults cost ~10 ms - a factor of $10^6$ slower than an L1 cache hit.

Memory-mapped files (mmap) map a file directly into the address space. Reading or writing those addresses reads/writes the file, with the kernel handling paging. Used by databases (SQLite, LMDB) and for efficient IPC.

import mmap

with open("data.bin", "r+b") as f:
    mm = mmap.mmap(f.fileno(), 0)
    print(mm[0:10])        # read first 10 bytes
    mm[0:5] = b"hello"     # write directly to file
    mm.close()

Examples

Circular reference and gc.collect():

import gc

gc.disable()   # turn off automatic cyclic GC for demonstration

class Leaky:
    def __init__(self, name):
        self.name = name
        self.partner = None
    def __del__(self):
        print(f"{self.name} freed")

x = Leaky("A")
y = Leaky("B")
x.partner = y
y.partner = x

del x, y       # no output - cycle prevents __del__
print("before gc.collect")
collected = gc.collect()
print(f"after gc.collect: freed {collected} objects")
# Now you'll see "A freed" and "B freed"

Common memory leak in C:

// Leak: buffer allocated in loop, never freed
void process_requests(int n) {
    for (int i = 0; i < n; i++) {
        char *buf = malloc(4096);
        handle_request(buf);
        // forgot: free(buf);  <-- every iteration leaks 4KB
    }
}

valgrind --leak-check=full would report: definitely lost: N bytes in N blocks pointing exactly to the malloc call site.

The core tension in memory management is between control and safety. C gives you complete control and leaves you responsible for every byte. Python and Java take control away and give you safety in return. Rust threads the needle - the borrow checker enforces memory safety at compile time with zero runtime overhead.

Read Next: