Dynamic Linking Security

Technical Overview

Dynamic linking's core mechanism — resolving function addresses at runtime via mutable GOT entries — creates a persistent attack surface in every dynamically linked process. A single arbitrary write primitive (from a heap overflow, format string bug, or use-after-free) aimed at the GOT can redirect any subsequent library call to attacker-controlled code, achieving arbitrary code execution without a separate code injection step.

This document covers the attack techniques against dynamic linking, the defense mechanisms the OS/compiler/linker provide, and the threat model for supply chain attacks involving shared libraries. Understanding these mechanics is prerequisite for writing secure systems software and for correct use of hardening flags.

Prerequisites

Solid understanding of PLT/GOT mechanism (see 03-linkers-and-loaders.md)
Understanding of ASLR, NX/DEP, and stack canaries as baseline mitigations
Familiarity with ELF binary format
Basic exploitation concepts: arbitrary read/write primitives, code execution

Dynamic Linking Security Checklist

+----------------------------------------------------------+
| RELRO Status            | Impact if missing              |
+-------------------------+--------------------------------+
| No RELRO                | .got, .got.plt fully writable  |
|                         | GOT overwrite trivial          |
+-------------------------+--------------------------------+
| Partial RELRO (default) | .got read-only, .got.plt       |
|                         | still writable — PLT entries   |
|                         | still overwritable             |
+-------------------------+--------------------------------+
| Full RELRO              | .got.plt read-only after load  |
|   (-z relro -z now)     | GOT overwrite impossible       |
+----------------------------------------------------------+

Linking Security Flags Checklist:
[ ] -fPIE -pie                   PIE executable (ASLR applies to base)
[ ] -Wl,-z,relro,-z,now         Full RELRO (requires PIE)
[ ] -Wl,-z,noexecstack          Non-executable stack
[ ] -Wl,-z,separate-code        Separate code/data segments
[ ] -D_FORTIFY_SOURCE=2         Buffer overflow detection in libc calls
[ ] -fstack-protector-strong    Stack canaries
[ ] -fcf-protection=full        Intel CET (IBT + SHSTK)
[ ] -fsanitize=cfi              Clang CFI (indirect call type checks)

Core Content

GOT Overwrite Attacks

Attack scenario: An attacker has exploited a vulnerability that provides an arbitrary write primitive — they can write any 8 bytes to any writable virtual address in the process.

Target: The .got.plt section, which contains pointers to external library functions. These pointers are writable during program execution (under partial RELRO or no RELRO) and are dereferenced on every external function call.

Exploit flow: 1. Attacker determines the address of the GOT entry for free (or exit, printf, any frequently called function) 2. If ASLR is active, attacker needs a leak of a library address to compute the base. This is typically obtained via a memory disclosure bug (arbitrary read, format string %p). 3. Attacker writes the address of a ROP gadget or system() to GOT[free] 4. Next call to free(ptr) in the program calls system(ptr) instead. If ptr was controlled (e.g., free("/bin/sh")), this executes a shell.

This attack class dominated binary exploitation from 2005 to ~2015. Tools like pwntools have one-line helpers for GOT overwrite:

# pwntools exploitation example (educational)
from pwn import *
elf = ELF('./vulnerable')
# Find the GOT entry address for 'free'
got_free = elf.got['free']
# Find the plt for 'system'
plt_system = elf.plt['system']
# Overwrite GOT[free] with plt_system via the arbitrary write
# primitive in the vulnerability...

Without ASLR + without RELRO: GOT entries are at fixed addresses. The attacker knows them statically from readelf -r ./binary. This is why ASLR + RELRO together form a meaningful defense — ASLR alone is bypassed with an info leak; RELRO alone is less meaningful without ASLR because fixed GOT addresses are known.

RELRO: RELocation Read-Only

Partial RELRO (enabled by default in modern GCC/Clang builds): - Reorders sections so that .got (non-PLT global variables via GOT) appears before .bss in the address space - Marks .got read-only after dynamic linking is complete - .got.plt (the PLT-used GOT entries) remains writable for lazy binding - Result: GOT entries for global variables are protected; PLT entries are still overwritable

Full RELRO (-Wl,-z,relro,-z,now or equivalently -Wl,-z,relro -Wl,-z,now): - Forces ld.so to resolve all PLT symbols at load time (eager binding, -z now) - After all relocations are resolved, the entire .got.plt section is marked read-only via mprotect - Result: No writable GOT entries at all during program execution. GOT overwrite is impossible.

Cost of full RELRO: - Startup latency: All PLT symbols resolved before main() runs. For programs with hundreds of library functions, this adds 1–10ms to startup. Negligible for servers; relevant for short-lived commands. - Incompatibility with some dynamic loading patterns: If a library needs to lazily resolve its own GOT entries after startup (rare), full RELRO breaks it.

# Build with full RELRO
gcc -fPIE -pie -Wl,-z,relro,-z,now -o hardened main.c

# Verify RELRO
checksec --file=hardened
# Expected: RELRO: Full, Stack: Canary found, NX: enabled, PIE: enabled

# Alternative verification:
readelf -l hardened | grep GNU_RELRO
# Shows size of the RELRO-protected region

ASLR for Shared Libraries

Address Space Layout Randomization (ASLR) for shared libraries randomizes the base address at which each .so is mapped. The linker must produce position-independent code (-fPIC) for this to work — code that uses relative offsets, not absolute addresses.

On Linux x86-64, ASLR provides: - 28 bits of entropy for library base addresses (from /proc/sys/kernel/randomize_va_space=2) - 64-bit mmap randomization: 2^28 = 268 million possible addresses per library

ASLR is defeated by: - Info leak: Any memory disclosure (format string, use-after-free that reads memory, leak of a pointer) that reveals a library address. Once one address in libc.so is known, the entire libc.so is located (fixed offsets from the base). - Brute force: For 32-bit processes, only 8–16 bits of entropy → brute force in <10,000 attempts. Not practical on 64-bit. - Heap spraying: Fill the heap with valid pointers/shellcode so that a somewhat-random jump hits exploitable code. Less effective with 64-bit ASLR.

LD_PRELOAD Attacks

LD_PRELOAD=/path/to/evil.so ./target instructs ld.so to load evil.so before all other shared libraries, including libc.so. Any symbols defined in evil.so take precedence over library definitions, because ld.so uses the first definition found.

Attack uses: - Hook read() / write() to exfiltrate data - Hook strcmp() / memcmp() to log password comparisons - Hook execve() to detect/block/redirect process execution - Hook malloc() for heap profiling (legitimate use)

Defense mechanisms:

setuid/setgid protection: ld.so ignores LD_PRELOAD (and LD_LIBRARY_PATH) when the process has elevated privileges (effective UID ≠ real UID, or effective GID ≠ real GID). This prevents unprivileged users from injecting libraries into privileged executables.
Static linking: A statically linked binary has no ld.so involvement → LD_PRELOAD has no effect. Used for security-critical binaries (sshd, sudo, su on some distros).
System call interception via seccomp: While not blocking LD_PRELOAD, seccomp-bpf policies restrict which system calls the process can make. A LD_PRELOAD library can't escalate privileges if execve and network calls are blocked.
SELinux / AppArmor: Mandatory Access Control policies can prevent a process from loading unexpected shared libraries regardless of LD_PRELOAD.

Legitimate LD_PRELOAD uses: - Memory allocation profilers (tcmalloc, jemalloc, valgrind) - System call tracing (ltrace, strace's older mechanism) - Testing/mocking (inject a mock libssl.so in tests) - faketime (intercept clock_gettime())

Library Search Order Hijacking

When ld.so searches for a required library, it uses this order: 1. Directories in the binary's RPATH (deprecated, baked in at link time, non-overridable) 2. Directories in LD_LIBRARY_PATH environment variable 3. Directories in the binary's RUNPATH (modern replacement for RPATH, can be overridden by LD_LIBRARY_PATH) 4. /etc/ld.so.cache (maintained by ldconfig) 5. Default directories: /lib, /usr/lib, /lib64, /usr/lib64

Hijacking via LD_LIBRARY_PATH: An attacker who can set environment variables for a process can redirect library loads to a malicious path. This is the LD_LIBRARY_PATH hijack — place a malicious libssl.so.1.1 in /tmp/evil/ and set LD_LIBRARY_PATH=/tmp/evil. Same defenses apply as LD_PRELOAD (setuid ignores it).

RPATH/RUNPATH injection in supply chain attacks: If a build system is compromised and inserts RUNPATH=/tmp into a distributed binary, and /tmp is attacker-controlled on target machines, library hijacking occurs at runtime.

Check a binary's RPATH/RUNPATH:

readelf -d ./binary | grep -E '(RPATH|RUNPATH)'
chrpath -l ./binary
# Remove RPATH entirely if not needed:
chrpath --delete ./binary

DLL Injection on Windows

The Windows analog of LD_PRELOAD is DLL injection. A DLL (Dynamic Link Library) can be injected into a running process via: - CreateRemoteThread + LoadLibrary: Create a thread in the target process that calls LoadLibrary with the path to the attacker's DLL - SetWindowsHookEx: Install a system-wide hook that causes Windows to load the DLL into any process that receives certain messages - DLL search order hijacking: Windows DLL search order includes the application directory first. If evil.dll is placed in the same directory as target.exe, and target.exe loads crypto.dll, placing evil.dll as crypto.dll in the application directory causes it to be loaded first.

Windows defenses: Safe DLL Search Mode (enabled by default since Vista), Known DLL protection (critical system DLLs mapped from a system-managed location), Code Signing (DLLs can require valid signatures), and AppLocker/WDAC policies.

Supply Chain in Dynamic Linking

Dynamic linking creates a trust dependency: your application implicitly trusts every .so it loads. Compromising a widely-used system library affects all applications that link against it.

Real-world supply chain attacks via shared libraries: - XZ Utils backdoor (2024): A malicious contributor to liblzma (part of xz-utils) inserted a backdoor via build system manipulation. On affected systems, sshd linked against liblzma (via libsystemd) was backdoored. The backdoor patched the PLT/GOT of sshd at startup (via a constructor function in liblzma) to intercept RSA key authentication. This is perhaps the most sophisticated supply chain attack targeting dynamic linking mechanics ever discovered. - SolarWinds (2020): DLL hijacking in the build process — malicious DLL injected into the Orion update. Not a dynamic linking mechanism attack, but exploiting the same trust model.

Mitigations for supply chain in dynamic linking: - Verify .so checksums (package manager signatures) - Prefer static linking for critical security infrastructure - Use containers (Docker) or VMs to isolate library versions - Monitor ld.so calls with auditd (Linux Audit) or eBPF

eBPF-based library load monitoring:

// BPF program to log every execve with library paths
// (pseudocode)
SEC("kprobe/security_mmap_file")
int log_mmap(struct pt_regs *ctx) {
    struct file *file = (struct file *)PT_REGS_PARM1(ctx);
    // Extract filename and log to userspace
}

Controlling Dynamic Linking for Security

Static linking for security-critical binaries: sshd, sudo, cryptographic tools benefit from static linking to eliminate LD_PRELOAD/LD_LIBRARY_PATH as attack vectors (though the setuid protection covers most cases). The cost: binary size, no automatic security updates from library patches.

# Fully static link with musl libc (smaller than glibc static)
musl-gcc -static -o my_tool main.c

# Or with glibc static (larger, some features like NSS don't work statically)
gcc -static main.c -o my_tool

dlopen and dlsym for plugins require careful security design:

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

// Load a plugin safely
void* load_plugin(const char *path) {
    // Verify the plugin path is in a trusted directory
    if (strncmp(path, "/opt/trusted_plugins/", 21) != 0) {
        fprintf(stderr, "Refusing to load plugin from untrusted path: %s\n", path);
        return NULL;
    }

    // RTLD_NOW: resolve all symbols immediately (fail fast on missing symbols)
    // RTLD_LOCAL: don't pollute global symbol namespace
    void *handle = dlopen(path, RTLD_NOW | RTLD_LOCAL);
    if (!handle) {
        fprintf(stderr, "dlopen failed: %s\n", dlerror());
        return NULL;
    }

    // Verify plugin exports expected interface
    void (*plugin_init)(void) = dlsym(handle, "plugin_init");
    if (!plugin_init) {
        fprintf(stderr, "Plugin missing plugin_init: %s\n", dlerror());
        dlclose(handle);
        return NULL;
    }

    plugin_init();
    return handle;
}

Additional dlopen security: verify the .so is owned by root and not world-writable before loading; use SELinux file contexts to restrict which processes can load which libraries.

Symbol Versioning

Symbol versioning allows multiple versions of the same function to coexist in a shared library, enabling ABI compatibility:

// In libc.so.6, multiple versions of memcpy coexist:
// memcpy@GLIBC_2.2.5 — old ABI (for binaries linked against old glibc)
// memcpy@GLIBC_2.14  — new ABI with different destination-overlapping behavior

// A binary linked against glibc 2.14+ gets the new memcpy
// A binary linked against glibc 2.2.5 gets the old memcpy
// Both run correctly on a system with glibc 2.34

Version scripts for shared library versioning:

/* version.map */
MY_LIB_1.0 {
    global: exported_function;
    local: *;   /* all other symbols hidden */
};

MY_LIB_2.0 {
    global: new_exported_function;
} MY_LIB_1.0;

gcc -shared -fPIC -Wl,--version-script=version.map -o libfoo.so foo.c

Historical Context

The PLT/GOT mechanism was designed for performance (lazy binding avoids the startup cost of resolving all symbols) and not with security in mind. GOT overwrite as an exploitation technique was documented in Phrack magazine in the early 2000s. RELRO was developed as a mitigation; it appears to have been added to the GNU toolchain around 2007–2008. The progression of memory corruption mitigations (ASLR → NX/DEP → RELRO → CFI) represents two decades of attacker/defender back-and-forth in binary exploitation.

The XZ Utils backdoor (2024) demonstrated that even sophisticated dynamic linking defenses can be bypassed by attacking the upstream software supply chain before the binary is built — highlighting that security of dynamic linking includes the entire build and distribution chain.

Production Examples

# Check all security properties of a binary
checksec --file=/usr/sbin/nginx

# Output example (well-hardened binary):
# RELRO    STACK CANARY NX         PIE
# Full     Canary found NX enabled PIE enabled

# Check libraries loaded by a process at runtime
cat /proc/$(pgrep nginx | head -1)/maps | grep '\.so'

# Audit LD_PRELOAD usage system-wide (via auditd)
auditctl -a always,exit -F arch=b64 -S execve \
  -F env=LD_PRELOAD -k ld_preload_exec
ausearch -k ld_preload_exec | tail -20

# Intercept all dlopen calls with eBPF (bpftrace)
bpftrace -e 'uprobe:/lib/x86_64-linux-gnu/libdl.so.2:dlopen {
    printf("PID %d dlopen: %s\n", pid, str(arg0));
}'

Debugging Notes

LD_DEBUG=all ./myapp 2>&1 | head -100 shows every step of ld.so's initialization: library loading, symbol resolution, relocation processing. Invaluable for diagnosing missing symbol or wrong library version issues.
LD_DEBUG=libs ./myapp shows just library search paths and selections.
pldd <pid> lists all shared libraries currently loaded in a process (Linux 2.19+).
If a binary crashes immediately with SIGBUS or SIGSEGV before main(), check LD_DEBUG=all output for relocation failures. Full RELRO issues manifest here.
Under Valgrind, LD_PRELOAD is used by Valgrind itself to interpose on allocation functions — this is why LD_PRELOAD of a custom allocator conflicts with Valgrind.

Security Implications Summary

Dynamic linking is a trust chain: the binary trusts ld.so, which trusts libraries found on the search path, which are trusted because they're installed by the package manager (which trusts upstream maintainers).
Every link in this chain is an attack surface: supply chain compromise at any level undermines the entire chain.
Full RELRO + PIE + stack protector + CFI is the current best-practice hardened build configuration. It does not protect against a compromised library but prevents classic memory corruption → GOT overwrite → code execution chains.
For highest-security deployments (HSMs, critical infrastructure): static linking + musl libc + seccomp-bpf syscall filtering + SELinux eliminates the dynamic linking attack surface at the cost of flexibility.

Performance Implications

Full RELRO with eager binding: typically 1–10ms additional startup time for programs with 50–200 external functions. Negligible for long-running servers; may be relevant for frequently invoked CLI tools.
CFI instrumentation overhead: ~1–3% CPU overhead on indirect-call-heavy workloads. Acceptable in most cases.
Removing unused dynamic dependencies (use --as-needed linker flag): reduces load time by not mapping libraries with zero used symbols.

Failure Modes

RELRO breaks plugin system: A plugin system that uses dlopen after startup and expects to patch GOT entries may fail with full RELRO. Redesign to use function pointer tables instead of relying on GOT mutability.
LD_PRELOAD conflict: Two LD_PRELOAD libraries both interpose on malloc. The one listed first wins; the second's interpositioning is broken. Use RTLD_NEXT in each to chain properly.
CFI violation in generated code: If a JIT compiler generates code that makes indirect calls through function pointers, CFI may fault because the generated code doesn't follow CFI's type-checking protocol. JIT compilers must disable CFI for their generated call sites.

Modern Usage

Clang's Control Flow Integrity (CFI) (-fsanitize=cfi) inserts compile-time type checks before every indirect call. If an indirect function call targets a function of a different type signature, CFI aborts the program. This defeats virtually all call-oriented ROP chains that depend on type confusion.

Chrome, Android, and LLVM itself are compiled with CFI in production. The overhead is 1–3% for typical workloads.

Apple's Hardened Runtime on macOS: A flag that restricts runtime code execution and library loading — disables JIT, restricts dlopen to signed libraries, and prevents LD_PRELOAD injection without explicit entitlement. All App Store apps are required to use the Hardened Runtime.

Future Directions

Shadow stacks (Intel CET / ARM PAC): Hardware-enforced return address integrity. Intel's CET SHSTK (Shadow Stack) maintains a separate read-only stack just for return addresses. ARM Pointer Authentication (PAC) signs return addresses. Both prevent ROP. Supported in Linux 5.18+ (CET) and Apple Silicon (PAC).
ShadowCallStack and BTI on AArch64: Available in Android builds; protecting return addresses and ensuring indirect branches target valid landing pad instructions.
SBOM (Software Bill of Materials) for dynamic dependencies: Tools like syft and grype enumerate the dynamic library graph of an application and check versions against vulnerability databases.
eBPF-based LSM for library load control: BPF LSM (Linux Security Module) hooks on security_mmap_file to enforce policy on which libraries any process may load, at the kernel level, without requiring SELinux labels.

Exercises

Build a vulnerable binary that performs GOT[puts] overwrite via a contrived arbitrary-write primitive. Use pwntools to write the exploit. Then rebuild with full RELRO and observe that the overwrite fails (SIGSEGV on write to read-only memory).
Write an LD_PRELOAD library that intercepts open(), checks if the opened path starts with /etc/passwd, and returns EACCES if so. Test it with cat /etc/passwd vs LD_PRELOAD=./deny_passwd.so cat /etc/passwd. Verify it has no effect on a setuid binary.
Build a binary with partial RELRO and one with full RELRO. Use gdb to examine the writability of the .got.plt section: set a hardware watchpoint on a GOT entry and observe whether lazy binding fires on first call in the partial case but not the full RELRO case.
Implement a secure plugin loader: the loader accepts a plugin path, verifies it is in a trusted directory, checks its filesystem permissions (owner=root, no world-write), and loads it with RTLD_NOW | RTLD_LOCAL. Demonstrate that a plugin in /tmp is rejected.
Set up auditd to log all LD_PRELOAD usage on a test system. Write a Bash script that simulates an attacker using LD_PRELOAD to hijack read(). Verify the audit log captures it. Then write a complementary response script that kills any process using an unauthorized LD_PRELOAD.

References

Nergal, "The Advanced Return-into-lib(c) Exploits." Phrack 58, 2001. (GOT overwrite origins)
Tavis Ormandy, "Analysis of the XZ Backdoor." https://openwall.com/lists/oss-security/2024/03/29/4
Ulrich Drepper, "How to Write Shared Libraries." https://www.akkadia.org/drepper/dsohowto.pdf — Section 2.3 on RELRO
Pax Team, "Address Space Layout Randomization." https://pax.grsecurity.net/docs/aslr.txt
Clang CFI documentation: https://clang.llvm.org/docs/ControlFlowIntegrity.html
checksec.sh: https://github.com/slimm609/checksec.sh
"Intel® CET Answers Call to Protect Against Common Malware Threats." Intel white paper, 2020.