Linkers and Loaders

Technical Overview

Linking is the final step of the compilation pipeline: combining separately compiled translation units (object files) into a single executable or shared library. The linker performs two fundamental operations: symbol resolution (matching undefined symbol references to their definitions) and relocation (fixing up addresses that were unknown at compile time). The loader then maps the final binary into a process's virtual address space and handles any remaining relocations before transferring control to the entry point.

Understanding the linker is essential for systems engineers. Linker errors (undefined reference, multiple definition, ODR violations) are common, and their causes are often non-obvious without understanding ELF structure, the symbol table, and the relocation process. Dynamic linking adds the PLT/GOT mechanism, which has both performance and security implications that every systems engineer should understand precisely.

Prerequisites

ELF binary format basics
Virtual memory and memory mapping concepts
Process execution model (entry point, _start)
Basic understanding of position-independent code (PIC)
Symbol visibility basics in C (extern, static)

PLT/GOT Lazy Binding Diagram

First call to printf() in dynamically linked binary:

  Your code                 PLT                     GOT                 ld.so
  +-----------+             +-------------------+   +------------+
  | call printf@plt |       | printf@plt:       |   | GOT[printf]|
  +------+----+             |   jmp *GOT[printf]|-->| = PLT+6    |    (initial:
         |                  |   push index      |   +------------+     stub addr)
         +----------------->|   jmp PLT[0]      |
                            +---+---------------+        |
                                |                        v PLT+6
                                |               +-------------------+
                                +-------------->|   PLT[0]:         |
                                                |   push GOT[1]     |
                                                |   jmp *GOT[2]     |---> ld.so
                                                +-------------------+
                                                                       |
                                                         resolves symbol,
                                                         patches GOT[printf]
                                                         to real address

Second call to printf():

  Your code                 PLT                     GOT
  +-----------+             +-------------------+   +--------------------+
  | call printf@plt |       | printf@plt:       |   | GOT[printf]        |
  +------+----+             |   jmp *GOT[printf]|-->| = real printf addr |
         +----------------->|   ...             |   +--------------------+
                            +-------------------+          |
                                                           v
                                                    printf() runs directly

Core Content

Object File Format: ELF

The ELF (Executable and Linkable Format) is the standard binary format on Linux and most Unix-like systems. ELF is used for: relocatable object files (.o), shared libraries (.so), and executables.

ELF file structure:

+-------------------+
| ELF Header        | - magic: 0x7f 'E' 'L' 'F'
|                   | - class (32/64-bit), endianness
|                   | - type: ET_REL, ET_EXEC, ET_DYN
|                   | - machine (x86-64: 0x3E, AArch64: 0xB7)
|                   | - entry point (for executables)
|                   | - program header table offset
|                   | - section header table offset
+-------------------+
| Program Headers   | - segments (for execution/loading)
| (segments)        | - each: type, offset, vaddr, filesz, memsz, flags
+-------------------+
| Sections          |
|                   |
| .text             | - executable code
| .data             | - initialized writable data
| .rodata           | - read-only data (string literals, const)
| .bss              | - uninitialized data (zero-initialized, no file space)
| .symtab           | - symbol table (name + value + type + binding + section)
| .strtab           | - string table (null-terminated strings for symbol names)
| .rela.text        | - relocations for .text (addend form)
| .rel.text         | - relocations for .text (no addend form, x86-32)
| .dynamic          | - dynamic linking metadata (shared libs only)
| .plt              | - Procedure Linkage Table
| .got              | - Global Offset Table
| .got.plt          | - GOT entries for PLT (separated from .got in glibc)
| .debug_info       | - DWARF debug information
+-------------------+
| Section Header    | - table of section descriptors
| Table             | - each: name (offset into .shstrtab), type, flags, addr, size
+-------------------+

Sections vs Segments: Sections are the linker's view (granular, named, used during link time). Segments are the loader's view (coarser-grained, each mapped as a single mmap call with a permission set). The linker script controls how sections are merged into segments.

Key sections:

.text: Machine code. Read + Execute. In the executable segment.
.data: Initialized global variables. Read + Write.
.rodata: Read-only constants. Read-only. Often merged into the executable segment.
.bss: Block Started by Symbol. Uninitialized globals (or zero-initialized). Takes no space in the file (just a size); the OS zero-fills the virtual pages.
.symtab / .dynsym: Symbol tables. .symtab contains all symbols including local ones; .dynsym contains only exported/imported dynamic symbols (subset used by ld.so).
.rela.text: Relocation entries for .text. Each entry: offset (position in section to patch), info (symbol index + relocation type), addend.

Symbol Resolution

During linking, the linker maintains a global symbol table. It processes object files in order:

For each undefined symbol in object file A, record it as needed
For each defined symbol in object file A, add it to the global symbol table
When an undefined symbol in A is later satisfied by a definition in object file B, resolve the reference

Strong vs weak symbols: A regular function or variable definition is a strong symbol. Uninitialized global variables in C are weak (COMMON symbols). If a strong and weak definition conflict, the strong wins silently. Two strong definitions for the same symbol → linker error: multiple definition of foo.

Archive libraries (.a files): Processed differently. The linker only extracts a .o member from an archive if that member defines a symbol that is currently undefined. This means the order of archives on the command line matters — an archive placed before the .o that needs it will not extract anything.

# Wrong: libfoo.a before main.o — foo() may not be extracted
gcc main.o -lfoo

# Right: libraries after the objects that need them
gcc main.o -lfoo   # -L flags and order matter

# Or: use linker groups for circular dependencies
gcc main.o -Wl,--start-group -lfoo -lbar -Wl,--end-group

Relocation

When the compiler generates a .o file, it does not know the final addresses of symbols (because other .o files have not been combined yet). It emits relocation entries in .rela.text that say: "at offset X in .text, apply relocation type R to the address of symbol S, plus addend A."

Common x86-64 relocation types: - R_X86_64_PC32: Patch a 32-bit PC-relative reference (used for calls to local functions). *patch = S + A - P where S=symbol value, A=addend, P=patch location. - R_X86_64_PLT32: Like PC32 but for PLT (external function calls in position-independent code). - R_X86_64_64: Absolute 64-bit reference. Not valid in position-independent code (breaks ASLR). - R_X86_64_GOTPCREL: Reference through the GOT (for global variables in PIC code).

The linker processes all relocations after laying out all sections at their final addresses, then patches the binary.

Static vs Dynamic Linking

Static linking: All library code is copied directly into the executable. No runtime dependency on .so files. Advantage: self-contained binary, no ld.so overhead. Disadvantage: larger binary, no sharing of library code between processes (each process has its own copy in memory).

gcc -static main.c -o main_static  # fully statically linked
file main_static                    # ELF executable, statically linked
ldd main_static                     # not a dynamic executable

Dynamic linking: The executable references shared libraries by name. At load time, ld.so maps the shared libraries into the process's address space and resolves symbols. Advantage: library code is shared in physical memory across processes (one physical page of libc.so code serves all processes), smaller binaries, library updates (security patches) don't require relinking applications. Disadvantage: startup overhead, dependency management ("DLL hell").

Shared Libraries and Position-Independent Code

A shared library (.so) must be usable at any virtual address (since multiple processes may load it at different addresses, and ASLR randomizes the address). This requires position-independent code (PIC).

In PIC, code never uses absolute addresses. Instead: - Function calls go through the PLT (which uses an indirect jump through the GOT — a known relative offset) - Global variable accesses go through the GOT (also a relative offset from the code) - The GOT is populated at load time by ld.so with the actual runtime addresses

Compile with PIC: gcc -fPIC -shared -o libfoo.so foo.c

-fPIC vs -fpic: -fPIC is always safe (uses the largest displacement form). -fpic is a hint that the GOT is small enough for small displacements (only relevant on old 32-bit x86).

The Dynamic Linker: ld.so

ld.so (also called ld-linux.so.2 or ld-linux-x86-64.so.2) is the dynamic linker. It is itself a shared library, but it is invoked by the kernel as the interpreter for dynamically linked ELF executables.

Startup sequence for a dynamically linked binary:

Kernel loads the ELF executable
Kernel reads the PT_INTERP segment — the path to ld.so (e.g., /lib64/ld-linux-x86-64.so.2)
Kernel maps ld.so and transfers control to its entry point
ld.so reads the .dynamic section of the executable to find required shared libraries
For each DT_NEEDED entry, ld.so finds and maps the library (searching rpath, RUNPATH, LD_LIBRARY_PATH, /etc/ld.so.cache, default paths)
ld.so processes relocations in the executable that require external symbols
ld.so calls init functions (.init_array) of each library and the executable
Control transfers to the executable's entry point (_start)

PLT/GOT Mechanism

The Problem: An executable compiled with PIC calls printf. The call instruction needs an address. The address of printf in libc.so is unknown at compile time (ASLR). The call can't be patched at load time in .text (which is read-only and shared across processes).

The Solution: PLT (Procedure Linkage Table) + GOT (Global Offset Table).

The .plt section contains small stub functions — a few instructions each
The .got.plt section contains one pointer per external function, initially pointing back into the PLT (for lazy resolution)
When code calls printf, it calls printf@plt — the PLT stub
The PLT stub does an indirect jump through GOT[printf]
On the first call, GOT[printf] still contains the PLT stub address (lazy binding). The stub pushes the symbol index and jumps to the PLT resolver (PLT[0]), which calls ld.so's symbol resolver
The resolver finds the real printf address, patches GOT[printf] to the real address
Subsequent calls: the PLT stub's indirect jump through GOT[printf] jumps directly to printf — one indirect jump overhead

Eager binding (LD_BIND_NOW=1 or linker flag -z now): Resolve all PLT entries at load time. No lazy resolution overhead. Required for RELRO (see security section). All GOT entries are populated before the program runs.

Linker Script:

/* Minimal linker script example */
SECTIONS {
    . = 0x400000;                /* Start address */
    .text : { *(.text*) }       /* Merge all .text sections */
    . = ALIGN(0x1000);          /* Page-align */
    .rodata : { *(.rodata*) }
    . = ALIGN(0x1000);
    .data : { *(.data*) }
    .bss : { *(.bss*) *(COMMON) }
}

Linker scripts control section merging order, output file layout, and entry point. Embedded systems use custom linker scripts to place code in flash, data in RAM. The default linker script (view with ld --verbose) is complex (hundreds of lines).

Link-Time Optimization (LTO)

LTO enables cross-TU inlining by deferring code generation to link time:

Each .o file is compiled with -flto — it contains LLVM bitcode (or GCC GIMPLE) instead of machine code
The linker invokes the optimizer on the combined IR of all TUs
Functions in one TU can be inlined into callers in another TU
Whole-program analysis (devirtualization, dead function elimination) is possible

Two LTO modes: - Full LTO: All TUs combined into one module, optimized together. Highest optimization opportunity. Very slow link times for large projects. - Thin LTO: A scalable LTO algorithm. Each TU's summary is combined into a summary index. Per-TU optimizations run in parallel using the summary for cross-TU information. 5–10x faster than full LTO, with 80–90% of the optimization benefit.

# Thin LTO with Clang/LLD
clang -flto=thin -fuse-ld=lld main.c foo.c -o main

# Check that .o contains bitcode, not machine code
file main.o  # LLVM IR bitcode (with LTO)

Historical Context

ELF was developed for Unix System V Release 4 (1988) and standardized by the TISC (Tool Interface Standards Committee) in 1993. It replaced a.out as the dominant Unix binary format. The GNU linker (ld, part of binutils) has been the standard Linux linker since the early 1990s. gold (a faster GNU linker written in C++ by Ian Lance Taylor) was developed at Google and merged into binutils in 2008. lld (LLVM's linker) was developed starting around 2014 and is now substantially faster than both ld and gold for large binaries, making it the preferred linker for large C++ projects and all Rust compilation. The PLT/GOT lazy binding mechanism originated in UNIX SVR4 and has been the standard dynamic linking mechanism on Linux/ELF since the early 1990s.

Production Examples

# Inspect symbol table
nm -D /usr/lib/x86_64-linux-gnu/libc.so.6 | grep printf

# Show shared library dependencies
ldd /usr/bin/python3

# Show PLT and GOT contents in a binary
objdump -d -j .plt myapp
objdump -R myapp  # shows dynamic relocations (GOT entries)

# Show all symbols and their types
readelf -s myapp | head -40

# Show section headers
readelf -S myapp

# Show segment layout (what the loader sees)
readelf -l myapp

# Show dynamic section (needed libraries, RUNPATH, etc.)
readelf -d myapp

# Inspect a relocation section
readelf -r myapp.o

Debugging Notes

undefined reference to 'foo': A symbol is referenced but never defined. Check: is the library included? Is the library in the right order? Is the function declared extern where it should be static?
multiple definition of 'bar': Two .o files define the same symbol. Usually a function accidentally defined in a header that is included in multiple TUs, or a static that was meant to be static but isn't.
cannot find -lfoo: -lfoo requires libfoo.so or libfoo.a on the library search path. Add -L/path/to/lib or set LD_LIBRARY_PATH.
RUNPATH/RPATH issues: Binary was linked with an rpath that doesn't exist on the deployment machine. Check with readelf -d binary | grep RUNPATH.
Lazy binding vs GOT entries: Use LD_BIND_NOW=1 ./myapp to force eager binding — if it crashes on startup that it wouldn't with lazy, the missing symbol is masked by lazy binding and found later (or not at all).

Security Implications

GOT overwrite attacks: If an attacker can write an arbitrary value to any writable memory location (e.g., via a heap overflow into the .got.plt area), they can redirect the next call through that GOT entry to attacker-controlled code. This was a dominant exploitation technique in the 2000s-early 2010s.

RELRO (RELocation Read-Only): A mitigation that makes GOT entries read-only: - Partial RELRO (default in GCC/Clang): The .got section (non-PLT GOT entries for global variables) is marked read-only after relocation. .got.plt (PLT GOT entries) remains writable for lazy binding. - Full RELRO (-Wl,-z,relro,-z,now): Forces eager binding (all PLT entries resolved at load time), then marks the entire .got.plt read-only. Completely eliminates GOT overwrite as a technique. The dlopen / dlsym pattern still works because those create new GOT entries in newly loaded library mappings.

Enable full RELRO:

gcc -Wl,-z,relro,-z,now -fPIE -pie main.c -o main
# Verify:
checksec --file=main  # should show RELRO: Full

LD_PRELOAD injection: LD_PRELOAD=/path/to/evil.so ./myapp causes ld.so to load evil.so before all other libraries. Any symbols defined in evil.so override the canonical definitions. Attackers use this to intercept system calls, log credentials, or redirect program flow.

Defense: ld.so ignores LD_PRELOAD for setuid/setgid executables (the effective UID/GID differs from the real UID/GID). For non-setuid programs, LD_PRELOAD is the user's right. Systems that run third-party code should use containers or process isolation rather than relying on LD_PRELOAD defenses.

dlopen and dlsym for plugins:

void *handle = dlopen("./plugin.so", RTLD_NOW | RTLD_LOCAL);
if (!handle) { fprintf(stderr, "%s\n", dlerror()); exit(1); }
typedef int (*plugin_fn_t)(const char *);
plugin_fn_t fn = (plugin_fn_t)dlsym(handle, "plugin_process");
if (!fn) { fprintf(stderr, "%s\n", dlerror()); exit(1); }
int result = fn("input data");
dlclose(handle);

RTLD_NOW resolves all symbols immediately (safer than RTLD_LAZY). RTLD_LOCAL prevents the plugin's symbols from being visible to other libraries (avoids symbol conflicts).

Performance Implications

Each PLT call (external function call) adds ~5–10 ns overhead (indirect jump + possible instruction cache miss) vs a direct call. In tight loops calling printf or malloc millions of times, this matters. Consider wrapping hot external calls in a local wrapper.
LD_PRELOAD adds an additional symbol lookup for every intercepted call. In security monitoring tools that LD_PRELOAD into every process, this overhead can be 2–5% CPU.
Shared library startup (mapping, relocation, init functions): For programs with many small .so dependencies (Java Spring startup loads 500+ jars, many native libs), startup time is dominated by dynamic linking.
TLS (Thread-Local Storage) accessed via IE (Initial Exec) model is fast (1 instruction). TLS accessed via GD (Global Dynamic) model requires a function call to __tls_get_addr. Use __attribute__((tls_model("initial-exec"))) for performance-sensitive TLS.

Failure Modes

Symbol versioning conflict: GLIBC_2.17 vs GLIBC_2.34 — binary requires a newer glibc symbol version than available on the target system. Results in symbol lookup error: ./myapp: undefined symbol: __memcpy_chk, version GLIBC_2.14.
Circular library dependencies: libA.so calls symbols in libB.so which calls symbols in libA.so. The linker may not resolve these without --start-group/--end-group. At runtime, ld.so handles circular deps correctly.
SONAME mismatch: libfoo.so.1 on build system, libfoo.so.2 on target. The binary was linked against libfoo.so (symlinked to libfoo.so.1); at runtime ld.so looks for libfoo.so.1 in its cache. If only libfoo.so.2 exists, load fails.
Relocation overflow: On x86-64, PC-relative relocations must fit in 32 bits. If two sections are more than 2GB apart in the address space, the relocation overflows. LLD's --Thunk feature inserts trampolines to handle this.

Modern Usage

LLD (LLVM's linker) is the standard linker for all Rust builds and for large C++ projects at companies like Google (Chromium), Meta, and Apple. It is 2–10x faster than GNU ld for large binaries due to its multi-threaded design.

LLVM's ORC JIT uses the JIT linker (llvm::jitlink) rather than the system linker for in-memory linking of JIT-compiled code. This avoids the overhead of writing and reading .o files from disk.

musl libc for static linking provides smaller, cleaner static binaries than glibc, making fully static linking practical for containers (the binary is a few MB and runs in scratch containers with no OS libraries).

Future Directions

BOLT (Binary Optimization and Layout Tool): Post-link optimizer. Reorganizes .text to minimize instruction cache misses using runtime profile data. Used at Meta for 5–10% speedups on production binaries without changing source code.
Propeller: Google's competing post-link optimizer, integrated into the LLVM toolchain.
Module-based linking for C++20: As C++ modules become standard, linkers will need to handle module interface units and their corresponding object files differently from traditional TUs.
Linker-level security hardening: Forward-edge CFI implementation in LLD (matching call instruction signatures) and XOM (eXecute-Only Memory, combining with hardware features like ARM WXOR) are active development areas.

Exercises

Write two .c files that define the same symbol with different implementations. Link them together. Observe the linker's behavior (which definition wins). Then mark one definition __attribute__((weak)) and observe the change.
Compile a program with printf dynamically linked. Use objdump -d -j .plt to locate the PLT stub. Use gdb to set a breakpoint on printf@plt. Step through the first call, observing the GOT resolution. Set another breakpoint on the second call and observe the direct dispatch.
Write a minimal LD_PRELOAD library that intercepts malloc (via dlsym(RTLD_NEXT, "malloc")), logs each allocation size to stderr, and calls the real malloc. Test it on /bin/ls.
Build a binary without RELRO and one with full RELRO (-Wl,-z,relro,-z,now). Use readelf -l to verify the presence of GNU_RELRO program header and the size of the RELRO segment. Use checksec or pwntools for a high-level summary.
Write a linker script that places a custom .my_data section at a specific address and exports a symbol __my_data_start pointing to its beginning. Verify by reading the symbol value and comparing to the actual address in the process with /proc/<pid>/maps.

References

Ian Lance Taylor, "Linkers and Loaders" series. 20-part series on the GCC mailing list, 2007. https://lwn.net/Articles/276782/
John R. Levine. Linkers and Loaders. Morgan Kaufmann, 1999.
Michael Kerrisk. The Linux Programming Interface. No Starch Press, 2010. Chapter 41–42 on shared libraries.
ELF Specification: https://refspecs.linuxfoundation.org/elf/elf.pdf
LLD documentation: https://lld.llvm.org/
Ulrich Drepper, "How to Write Shared Libraries." https://www.akkadia.org/drepper/dsohowto.pdf