Linkers & Loaders - The Last Step Before Your Code Runs // Megha Bose

Helpful context:

You write a C program that calls printf. You didn’t write printf. There is no printf in your source file, no printf.c anywhere in your project. You compile, link, run, and text appears in your terminal. Where did the printf code come from? How did the CPU know where to jump when your code said call printf? What happens at the moment your program starts, before main is called?

These questions are the domain of linkers and loaders - the infrastructure that takes compiler output and turns it into a running process. Understanding them does not just answer philosophical questions about where printf comes from; it explains every “undefined symbol” build failure, every “works on my machine” Linux portability nightmare, every library dependency conflict, and why Go binaries are so satisfying to deploy.

The Problem Linkers Solve

In the 1960s and early 1970s, programs were compiled as single translation units - one source file, one compilation, one binary. As programs grew larger, this became impractical. Compiling everything together was slow; a change to one function required recompiling everything. The solution was separate compilation: divide the program into multiple source files, compile each independently into an object file, then combine the object files into a final executable.

Separate compilation creates a problem: each object file refers to symbols (functions, global variables) defined in other files. The compiler cannot resolve these references - it only sees one file at a time. Someone else has to connect the references to their definitions. That someone is the linker.

The linker also has to assign final addresses. When the compiler compiles foo.c, it does not know where in the final executable foo.c’s code will end up. All addresses are relative to zero. The linker decides the final layout and patches every address reference in the code.

Object Files and the ELF Format

On Linux, object files (.o) and executables both use the ELF (Executable and Linkable Format) container. ELF divides a file into sections:

.text - executable machine code
.data - initialized global and static variables (present in the file)
.bss - uninitialized global and static variables (a size, no actual bytes - the OS zeroes this at load time)
.rodata - read-only data: string literals, const global variables
.symtab - symbol table: every name this file defines or references
.rela.text - relocation entries: “patch these addresses once final locations are known”

Each object file’s symbol table lists:

Defined symbols: functions and globals that this file provides (marked as exported)
Undefined symbols: names this file references but expects another file to define

When foo.c calls printf, the compiler emits a call instruction with a placeholder address, and records in the relocation table: “at this offset in .text, insert the address of the symbol printf.”

You can inspect all of this directly:

gcc -c foo.c -o foo.o          # compile to object file, do not link
nm foo.o                        # list symbols (U = undefined, T = defined in .text)
readelf -S foo.o                # show all ELF sections
readelf -r foo.o                # show relocation entries
objdump -d foo.o                # disassemble

What the Linker Does

The linker’s job reduces to two tasks: symbol resolution and relocation.

Symbol resolution: collect all symbol tables from all object files and libraries. For each undefined symbol reference, find a definition. If a symbol is defined in a static library (.a file - a bag of .o files), extract only the object files that provide needed symbols. If a required symbol has no definition anywhere, fail with the error every C developer knows: undefined reference to 'foo'.

A subtlety: link order matters. The linker processes the command line left to right. Libraries are scanned once; if a library appears before the object file that needs it, the symbol is never requested when the library is scanned, and will be reported as undefined. The correct order is: object files first, then the libraries they depend on.

# WRONG: libfoo.a is scanned before foo.o needs anything from it
gcc -lfoo foo.o -o program

# CORRECT
gcc foo.o -lfoo -o program

Relocation: assign final virtual addresses to each section. The linker uses a linker script (you can inspect the default with ld --verbose) that specifies the memory layout: where .text starts, where .data follows, where the stack and heap will be. Once layout is determined, the linker iterates through every relocation entry and patches the placeholder addresses with real values.

Static Linking: The Hermit Crab Approach

Static linking bundles every library into the executable. The .a file is an archive; the linker extracts only the .o files it needs and copies their code directly into the binary.

The result is a self-contained executable: copy it to any compatible machine and it runs, regardless of what libraries are installed. This is why Go’s default deployment model (single static binary, no runtime dependencies) is beloved for containerized applications. The binary is the complete program.

The downsides: binary size grows. If ten programs link against the same library statically, there are ten copies of that library in RAM. And when the library has a security bug, every statically linked binary must be recompiled and redeployed to get the fix - there is no way to update the library in place.

Rust defaults to static linking for Rust code (dynamically linking only against the system libc, usually). The Rust community’s preference for single-binary deployment follows the same reasoning as Go’s.

Dynamic Linking: The Apartment Building

Dynamic linking produces a smaller executable that records which shared libraries (.so on Linux, .dylib on macOS, .dll on Windows) it needs, but does not include them. The libraries are loaded into memory at runtime and - crucially - shared between all processes that use them. Five processes using libc.so share one physical copy of its code pages.

The dynamic linker (/lib64/ld-linux-x86-64.so.2 on x86-64 Linux) handles this at runtime. But this introduces a problem: the compiler does not know, at link time, where in memory libc.so will be loaded. With ASLR (Address Space Layout Randomization), the load address changes every run. You cannot hardcode printf’s address.

The solution is two levels of indirection: the GOT and the PLT.

The Global Offset Table and Procedure Linkage Table

The Global Offset Table (GOT) is a table of pointers in the data segment. Each entry holds the runtime address of one external symbol. Code that needs the address of an external variable reads it from the GOT, which the dynamic linker fills in at load time.

For function calls, a second layer is added: the Procedure Linkage Table (PLT). Each dynamically linked function has a PLT stub - a small trampoline in the executable’s code. When you call printf, you actually call printf@plt. Here is what the stub does:

On the first call:

The PLT stub jumps to the GOT entry for printf.
The GOT entry has not been filled yet - it initially points back into the PLT, to a “resolver” stub.
The resolver calls the dynamic linker (ld.so), which looks up printf’s actual address in libc.so.
The dynamic linker writes printf’s real address into the GOT entry.
Control transfers to printf.

On subsequent calls:

The PLT stub jumps to the GOT entry.
The GOT entry now holds the real address of printf.
Control transfers directly to printf. No dynamic linker involvement.

This is lazy binding: symbol resolution is deferred until first use. Programs with hundreds of dynamic symbols that never actually call most of them avoid paying the resolution cost. You can disable lazy binding with LD_BIND_NOW=1 or by linking with -z now, which forces all symbols to resolve at startup. Security-sensitive applications prefer this because it allows the GOT to be made read-only after startup (RELRO - Relocation Read-Only), closing a class of GOT overwrite attacks used to hijack control flow.

Position-Independent Code

Shared libraries must be loadable at any address, because different processes may load the same library at different virtual addresses. Position-Independent Code (PIC) achieves this by accessing all global data and external symbols through the GOT rather than with hardcoded addresses.

PIC is enabled with -fPIC when compiling object files that will end up in a shared library. The cost: every access to a global variable in a shared library is an extra pointer dereference through the GOT. For global-heavy code, this adds up. This is one reason C programs with many globals can be faster when compiled as a static executable.

On x86-64, the RIP-relative addressing mode (load from address relative to the current instruction pointer) reduces some of this overhead - code can access its own data at a fixed offset from where it is currently executing, without a GOT indirection.

The Dynamic Linker: Before main()

When you run a dynamically linked program, the OS does not jump to main. It maps the ELF segments into virtual memory, then - because the ELF PT_INTERP segment names the dynamic linker - jumps to the dynamic linker’s entry point. The dynamic linker is itself a shared library, loaded at a fixed address.

The dynamic linker’s startup sequence:

Maps all required shared libraries (found via DT_NEEDED entries in the ELF) into the process address space.
Resolves relocations in the main executable that reference library symbols.
Processes eager-binding symbol resolution (if -z now was used).
Calls each library’s initializer functions (__attribute__((constructor)), .init_array).
Transfers control to _start in the main executable, which sets up argc/argv/environment and calls main.

The library search order: LD_LIBRARY_PATH environment variable, then rpath embedded in the binary (set at link time with -rpath), then /etc/ld.so.cache (a cache of libraries found in standard directories), then /lib and /usr/lib.

Diagnosing Linker Problems

The tools for understanding and debugging linker issues:

nm binary                  # list symbols; U=undefined, T=defined in .text, D=initialized data
nm -u binary               # only undefined symbols
nm -D library.so           # dynamic symbols exported by a shared library

ldd binary                 # list shared library dependencies and resolved paths
ldd /bin/ls                # see what ls needs

readelf -d binary          # show dynamic section (DT_NEEDED, RPATH, etc.)
readelf -r binary          # show relocation table

objdump -d binary          # disassemble
objdump -p binary          # show program headers including interpreter (dynamic linker path)

ldconfig -p | grep libfoo  # check if libfoo is in the linker cache

When ldd shows a library as not found, the fix is usually one of: install the missing library, add its directory to LD_LIBRARY_PATH, or embed its path in the binary with -Wl,-rpath,/path/to/lib.

Library Versioning and the “Works on My Machine” Problem

Linux shared libraries use a versioning scheme: libfoo.so.1, libfoo.so.1.2.3. The SONAME embedded in the library (libfoo.so.1) allows multiple major versions to coexist. Programs linked against libfoo.so.1 can coexist with programs linked against libfoo.so.2.

But glibc versioning creates portability headaches. glibc symbols are versioned at the ABI level (GLIBC_2.17, GLIBC_2.38). A binary compiled on a system with glibc 2.38 that calls a function introduced in 2.38 will fail to load on a system with glibc 2.17 - even if the source code is identical. This is the source of the canonical “works on my machine (Ubuntu 24.04) but not on the server (CentOS 7)” Linux portability nightmare.

The solutions: compile on the oldest supported OS, use static linking, use a musl-based toolchain (musl is a lightweight libc that resolves this by having stable ABIs), or use containers that bundle the correct glibc version.

LD_PRELOAD: Intercepting Any Function

LD_PRELOAD is an environment variable naming shared libraries to load before all others. Symbols in the preloaded library take precedence over later libraries, including libc. You can intercept any dynamically linked function - including malloc, open, read, connect - without modifying the target binary.

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

static size_t total = 0;

void *malloc(size_t size) {
    static void *(*real_malloc)(size_t) = NULL;
    if (!real_malloc) real_malloc = dlsym(RTLD_NEXT, "malloc");
    total += size;
    fprintf(stderr, "malloc: %zu bytes (total: %zu)\n", size, total);
    return real_malloc(size);
}

gcc -shared -fPIC -o wrap_malloc.so wrap_malloc.c -ldl
LD_PRELOAD=./wrap_malloc.so ./my_program

Memory profilers, call tracers, and security wrappers all use this technique. RTLD_NEXT is the key: it retrieves the next symbol with that name in the load order, bypassing the preloaded override to call the real function.

Python ctypes and Dynamic Loading at Runtime

Python’s ctypes module demonstrates that dynamic loading is not just a linker-time concept - it can happen at any point during program execution.

import ctypes

# Load a shared library at runtime
lib = ctypes.CDLL("libm.so.6")

# Declare the function signature
lib.sqrt.argtypes = [ctypes.c_double]
lib.sqrt.restype = ctypes.c_double

# Call it - goes through PLT/GOT just like a C program
result = lib.sqrt(2.0)

Behind the scenes, ctypes.CDLL() calls dlopen() - the same mechanism the dynamic linker uses, exposed as a C API. The returned handle is used with dlsym() to resolve function addresses at runtime. This is the foundation of all Python FFI (Foreign Function Interface) libraries, including cffi and the Cython-generated C extensions that underlie NumPy and SciPy.

Lambda cold starts are partially a linker story: a larger deployment package means more shared libraries to map, more symbols to resolve, more initializer functions to call before your handler runs. The 50-200ms cold start penalty on a fat Python Lambda with many native extensions is largely dynamic linking overhead.

Static vs Dynamic: The Modern Debate

The industry has moved in cycles. Dynamic linking won decisively in the 1990s - it was more memory-efficient, enabled security patches without recompilation, and reduced binary sizes that mattered on constrained hardware. Then containers changed the calculus.

In a containerized world, each container has its own library copies anyway - the memory-sharing benefit of dynamic linking largely disappears. A statically linked binary in a Docker container is fully self-contained and eliminates a class of deployment failures. Go’s static linking and Alpine Linux’s musl-based static binaries are the contemporary expression of this: when you ship a container, you want to know exactly what is in it.

The DLL hell problem that plagued Windows in the 1990s - multiple applications needing different versions of the same DLL - is mitigated today by containers and virtual environments, but not eliminated. Nix and Guix (functional package managers) take the logical conclusion: each application gets its own complete closure of all dependencies, stored in a content-addressed store, with no sharing at the filesystem level.

Summary

Concept	What It Is	Why It Matters
Object file (.o)	Compiler output: machine code + unresolved symbols + relocation entries	The unit of separate compilation
ELF sections	.text/.data/.bss/.rodata/etc.	How code and data are organized in files
Symbol resolution	Matching undefined references to definitions	Source of “undefined reference” errors
Relocation	Patching placeholder addresses with final addresses	How code becomes executable at specific addresses
Static linking	All libraries copied into the binary	Self-contained deployment; larger binaries
Dynamic linking	Libraries loaded at runtime	Shared memory; library updates without recompilation
PLT/GOT	Indirection tables for lazy dynamic symbol resolution	How calls to `printf` work without knowing its address at link time
LD_PRELOAD	Preloading a library to intercept symbols	Memory profilers, security wrappers
RELRO	Making GOT read-only after startup	Preventing GOT overwrite attacks

Read Next:

Memory Models & Atomics - What Order Do Other Cores See Your Writes?