Skip to content

01 — The Memory Safety Problem

Technical Overview

Memory safety bugs are the dominant source of critical security vulnerabilities in systems software. For decades, languages like C and C++ — which give programmers direct control over memory — have been the foundation of operating systems, browsers, databases, and network infrastructure. That direct control comes with a cost: a wrong pointer, a missing bounds check, or a race condition can corrupt memory in ways that lead to arbitrary code execution, privilege escalation, or data exposure. This document catalogs the classes of memory safety bugs, presents statistics on their real-world prevalence, maps them to real CVEs, and surveys the language-level solutions that have emerged.

Prerequisites

  • Stack and heap memory layout
  • C pointer arithmetic and pointer semantics
  • Virtual memory and page tables
  • Undefined behavior in C/C++ standard
  • OS privilege model (ring 0 vs ring 3)
  • Thread memory model: races and visibility

Historical Context

The buffer overflow was first weaponized in the Morris Worm (1988). Over the following 35 years, the security community layered on mitigations: stack canaries (1998), NX/XD bit (2004), ASLR (2005), RELRO (2006), PIE (2008). Each mitigation raised the exploitation bar. None eliminated the underlying bug class. By 2019, Microsoft's Security Response Center published an analysis showing that 70% of all CVEs assigned to Microsoft products over the previous decade were memory safety bugs. Google published identical statistics for Chrome. The conclusion is unavoidable: the bug class cannot be mitigated away — it must be eliminated at the language level.


Classes of Memory Safety Bugs

1. Stack Buffer Overflow

The canonical memory safety bug. A fixed-size stack buffer is written beyond its end, corrupting adjacent stack data including the saved return address.

/* Vulnerable: fixed buffer, no bounds check */
void process_input(const char *input) {
    char buffer[64];
    strcpy(buffer, input);  /* copies until '\0', no length limit */
    /* if strlen(input) > 63, writes beyond buffer */
    /* overwrites: saved RBP, saved RIP, and beyond */
}

/* Call stack layout at entry to process_input:

   High address
   ┌─────────────────┐
   │ caller's frame  │
   ├─────────────────┤
   │ saved RIP       │  ← target for overwrite → control flow hijack
   │ saved RBP       │  ← often corrupted first
   │ buffer[63]      │
   │ ...             │
   │ buffer[0]       │  ← write starts here
   ├─────────────────┤
   Low address
*/

Real CVE: CVE-1999-0002 (mountd buffer overflow, 1999), CVE-2000-0322 (IIS .HTR buffer overflow). The class continues: CVE-2021-3156 (sudo heap overflow, 2021 — 10 years undetected).

Mitigation trajectory: - fgets() instead of gets() — caller's responsibility - Stack canaries (GCC -fstack-protector) — detect overflow before return - ASLR — randomize stack address, make target harder to guess - CFI (Control Flow Integrity) — restrict where return address can point

2. Heap Buffer Overflow

Writing beyond the end of a heap-allocated buffer. Corrupts adjacent heap chunks, potentially including heap metadata (chunk headers used by malloc/free) or other live objects.

/* Vulnerable: heap buffer, off-by-one */
char *buf = malloc(64);
memcpy(buf, input, strlen(input) + 1);  /* +1 for NUL, but buf only 64 bytes */
/* if strlen(input) == 64: writes 65 bytes — 1 byte overflow */
/* the 65th byte overwrites the size field of the next heap chunk */
/* free() of adjacent chunk: uses corrupted size → heap corruption */

Real CVE: CVE-2014-0160 (Heartbleed — buffer over-read on heap), CVE-2021-3156 (sudo heap-based buffer overflow — \ at end of argv corrupted heap metadata, allowing local privilege escalation).

Heartbleed specifics: Heartbleed is technically a heap buffer over-read — reading beyond the allocated buffer — rather than an overflow/write. The data is disclosed rather than corrupted. This makes it particularly dangerous: no crash, no detectable side effect, attacker gets memory content.

3. Use-After-Free (UAF)

Dereferencing a pointer to memory that has been freed. The freed memory may have been reallocated to a different object. Dereferencing the stale pointer reads/writes the different object.

struct Node {
    int value;
    void (*callback)(int);  /* function pointer */
};

struct Node *n = malloc(sizeof(struct Node));
n->callback = safe_function;
free(n);                    /* n is freed */

/* ... somewhere else, malloc returns the same memory for a different allocation */
char *attacker_data = malloc(sizeof(struct Node));
memcpy(attacker_data, attacker_controlled_bytes, sizeof(struct Node));
/* attacker_data overlaps n's freed memory */
/* attacker has written fake_function_address into the callback position */

n->callback(42);            /* UAF dereference: calls attacker-controlled address */
/* Result: arbitrary code execution */

Real CVE: CVE-2022-0185 (Linux cgroups UAF — local privilege escalation), CVE-2021-30551 (Chrome V8 UAF — renderer compromise), CVE-2022-26485 (Firefox SVG animation UAF — actively exploited in the wild).

Why UAF is prevalent: Ownership of heap allocations is implicit in C and C++. A large codebase may have dozens of code paths that free an object; ensuring that no stale pointer exists requires global program analysis that is infeasible manually.

4. Double-Free

Calling free() on the same pointer twice. The second free() corrupts the heap allocator's free list, potentially allowing an attacker to control what the next malloc() returns.

char *p = malloc(64);
/* ... */
free(p);    /* first free: correct */
/* ... */
free(p);    /* second free: p still has old value */
/* malloc's free list is now corrupted */
/* attacker may be able to cause malloc to return an arbitrary address */

Real CVE: CVE-2019-11707 (Firefox IonMonkey JIT double-free — actively exploited by Coinbase's security team as a browser 0day).

Modern heap mitigations: ptmalloc2 (glibc) now includes double-free detection (malloc: double free or corruption). However, glibc's detection can be bypassed in some scenarios (particularly when the pointer has been written between the two frees).

5. Null Pointer Dereference

Dereferencing a null pointer (address 0). In user space, this causes a segfault. In the kernel, null pointer dereferences are more dangerous: address 0 is mappable in some kernel configurations, and NULL dereference can be turned into arbitrary code execution by mapping attacker-controlled data at address 0.

/* Vulnerable: unchecked return value from malloc */
struct Config *cfg = malloc(sizeof(struct Config));
cfg->option = 42;  /* segfault if malloc returned NULL (OOM) */

/* Kernel example: */
static int (*hook)(void) = NULL;  /* function pointer, not initialized */
/* ... */
hook();  /* if hook is NULL: kernel dereferences address 0 */
/* if attacker mmap()s address 0 with executable code: RCE at kernel level */

Real CVE: CVE-2009-2698 (Linux kernel UDP sendmsg NULL ptr — local privilege escalation), CVE-2013-2094 (Linux perf_swevent NULL dereference — local privilege escalation). mmap_min_addr (Linux) was raised to 65536 to prevent NULL page mapping, specifically to mitigate this class.

6. Uninitialized Memory Read

Reading from memory that was allocated but never initialized. The content is whatever was in the memory previously — could be zeros (from fresh pages) or data from prior allocations. Uninitialized reads can leak sensitive data and cause unpredictable behavior.

/* Vulnerable: struct with padding bytes not initialized */
struct Response {
    uint8_t type;
    /* 3 bytes padding (for alignment of 'value') */
    uint32_t value;
    /* 4 bytes total structure padding at end */
    uint64_t id;
};

struct Response resp;
resp.type = 1;
resp.value = 42;
resp.id = user_id;
send(sock, &resp, sizeof(resp));  /* sends 8 bytes of uninitialized padding */
/* padding bytes contain stack garbage: previous stack frame contents */
/* attacker receives stack memory including potential pointers, secrets */

Real CVE: CVE-2010-3904 (Linux kernel rds_page_copy_user uninitialized stack read — kernel memory disclosed to user), CVE-2009-1185 (udev uninitialized stack — local privilege escalation).

In kernel context: Kernel stack is shared across system calls. Uninitialized kernel stack memory can contain data from previous system calls by other processes, including pointers to kernel objects, keys, and credentials.

7. Integer Overflow Leading to Memory Issues

Integer arithmetic overflow causes a value to wrap around, leading to a too-small allocation size that is subsequently overflowed by correctly-sized data.

/* Vulnerable: integer overflow before allocation */
void *allocate_matrix(size_t rows, size_t cols) {
    size_t size = rows * cols;  /* can overflow if rows * cols > SIZE_MAX */
    /* e.g., rows=0x80000000, cols=4: size = 0 on 32-bit */
    return malloc(size);        /* malloc(0) returns a tiny allocation */
}

/* Caller:
   char *m = allocate_matrix(0x80000000, 4);  // attacker-controlled sizes
   memcpy(m, data, rows * cols);              // copies 2^31 bytes into tiny alloc
   // heap overflow
*/

Real CVE: CVE-2002-0083 (OpenSSH integer overflow — remote code execution), CVE-2014-3153 (Android futex integer overflow — local privilege escalation, TowelRoot).

C standard: Integer overflow of signed integers is undefined behavior in C. The compiler may optimize away overflow checks if they are applied to signed arithmetic, reasoning that UB cannot occur.

8. Dangling Pointers

A pointer to a valid allocation that has since been freed. Distinct from use-after-free in that the memory has not yet been reused — the pointer "dangles" pointing to freed-but-not-reallocated memory. The behavior when dereferenced is undefined.

struct Session {
    int fd;
    char *name;  /* dynamically allocated */
};

void close_session(struct Session *s) {
    free(s->name);
    /* s->name is now dangling: freed but still has the old address */
    /* s itself is not freed — it's a valid struct, but s->name is dangling */
}

/* Later: */
printf("Session: %s\n", s->name);  /* dangling pointer read */
/* may read the new content of the freed memory, or crash */

Dangling pointers are the precursor to use-after-free: once the freed memory is reallocated, the dangling pointer becomes a UAF.

9. Type Confusion

Using a pointer of one type to access an object of a different type. In C++, type confusion often arises in object hierarchies when dynamic dispatch fails or is bypassed.

/* C++ type confusion in JavaScript engine */
class JSObject {
public:
    virtual void invoke();  /* virtual dispatch */
    int type_tag;
};

class JSFunction : public JSObject {
public:
    void *code_ptr;  /* function code */
};

class JSString : public JSObject {
public:
    char *data;      /* string data */
    size_t length;
};

/* If attacker can confuse a JSString for a JSFunction: */
JSObject *obj = get_string_from_user();  /* returns JSString */
/* attacker manipulates type_tag to appear as JSFunction */
((JSFunction*)obj)->code_ptr = attacker_shellcode;
obj->invoke();  /* virtual dispatch uses type-confused pointer → execute shellcode */

Real CVE: CVE-2021-30551 (Chrome V8 type confusion — in-the-wild exploited against Chrome), CVE-2022-22620 (WebKit JSC type confusion — in-the-wild exploited on iOS), CVE-2023-23529 (WebKit type confusion — iOS/macOS zero-day).

Type confusion is the dominant exploit class in modern browser engine exploits because JIT compilers optimistically make type assumptions and may not revalidate them.

10. Data Races

Two threads concurrently access shared memory, at least one access is a write, and there is no synchronization. Data races are undefined behavior in C/C++ and can cause any observable outcome: incorrect values, memory corruption, control flow hijack.

/* Vulnerable: shared counter with no synchronization */
int counter = 0;  /* shared global */

/* Thread 1 and Thread 2 both run: */
void increment() {
    counter++;  /* read-modify-write, not atomic */
    /* x86 machine code: MOV eax, [counter]; INC eax; MOV [counter], eax */
    /* two threads can interleave the MOV/INC/MOV sequence */
    /* lost update: both read 0, both increment to 1, one write loses */
}

Real CVE: CVE-2016-5195 (Dirty COW — data race in kernel COW path — privilege escalation), CVE-2019-1125 (SWAPGS speculation race — Spectre variant).

C/C++ undefined behavior: The C and C++ standards define data races as undefined behavior. A compiler optimizing a racy program may produce any output — including optimizations that appear correct in testing but fail in production due to memory reordering.


Statistics: Memory Safety Bug Prevalence

Microsoft Security Response Center (2019)

An analysis of all Microsoft security CVEs from 2004 to 2019 (approximately 15 years):

"~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues."

This covers Windows OS, Office, Edge/IE, SQL Server, Exchange — all written primarily in C and C++.

Google Project Zero / Chrome Security (2019-2020)

Google's Project Zero team analyzed exploitable Chrome bugs used in the wild:

"Over the course of 2019, 67% of Chrome's exploitable security bugs were memory safety issues."

A follow-up analysis of Android security vulnerabilities (2021) by the Android team:

"In Android 11 and earlier, ~70% of security vulnerabilities were memory safety issues."

In Android 12+, with Rust used for new code in Bluetooth, WiFi, and other subsystems:

"In 2022, the fraction of memory safety vulnerabilities in Android dropped to 36% — correlating with the adoption of Rust."

BlueHat 2019 (Microsoft Azure)

Matt Miller (MSFT Security Response):

"Since 2004, the percentage of CVEs due to memory unsafety has not meaningfully declined. Mitigations make exploitation harder but do not reduce the underlying vulnerability count."

NSA Cybersecurity Information Sheet (2022)

The US National Security Agency published guidance recommending memory-safe languages:

"NSA recommends companies consider making a strategic shift to memory safe languages... C and C++ are not memory safe."


Memory Safety Bug Class → CVE Mapping

Bug Class CVE Product Severity Impact
Buffer over-read CVE-2014-0160 OpenSSL (Heartbleed) 7.5 Private key disclosure
Data race CVE-2016-5195 Linux kernel (Dirty COW) 7.8 Local privilege escalation
Type confusion CVE-2021-30551 Chrome V8 8.8 RCE (renderer)
Heap buffer overflow CVE-2021-3156 sudo 7.8 Local root
Use-after-free CVE-2022-0185 Linux cgroups 8.8 Local root
UAF CVE-2022-26485 Firefox (in-the-wild) 8.8 RCE
Integer overflow CVE-2002-0083 OpenSSH 10.0 Remote root
Double-free CVE-2019-11707 Firefox IonMonkey 8.8 RCE
NULL deref (kernel) CVE-2009-2698 Linux UDP 7.2 Local privilege escalation
Uninitialized read CVE-2010-3904 Linux RDS 7.2 Kernel memory disclosure

Language Solutions Timeline

Garbage-Collected Languages (1995 onwards)

Java (1995), Python (1991), Go (2009) — allocate memory on a managed heap. A garbage collector (GC) periodically identifies and frees unreachable objects. This eliminates: - Use-after-free (GC never frees reachable objects) - Double-free (GC manages all frees) - Dangling pointers (GC maintains object lifetime) - Most buffer overflows (arrays have bounds in GC languages)

Remaining issues: - GC languages still have integer overflows - Type confusion is possible in GC languages with unsafe casts - Data races are possible (Go, Java) - GC languages have stop-the-world or concurrent GC pauses — unacceptable for real-time and latency-sensitive systems - Cannot be used for kernel, firmware, or tight memory-budget embedded systems - Performance overhead: GC CPU overhead 5-30%, memory overhead 2-5x

C++ Smart Pointers (C++11, 2011)

std::unique_ptr<T>: single-ownership pointer. Frees the owned value when the unique_ptr goes out of scope. Prevents manual free() mistakes and most double-free situations.

/* Safe: unique_ptr manages lifetime */
{
    auto n = std::make_unique<Node>();
    n->value = 42;
}  /* n goes out of scope: ~unique_ptr() calls delete automatically */
/* No need to call free/delete manually */

/* Prevents double-free: unique_ptr cannot be copied, only moved */
auto p = std::make_unique<int>(42);
auto q = p;  /* COMPILE ERROR: copy of unique_ptr is deleted */
auto q = std::move(p);  /* OK: ownership transfers to q, p becomes null */

std::shared_ptr<T>: reference-counted shared ownership. Safe from use-after-free as long as all holders use shared_ptr. However: - shared_ptr cycles → reference count never reaches 0 → memory leak - shared_ptr does not prevent data races on the pointed-to value - shared_ptr is ~3-5x more expensive than raw pointer operations

Remaining issues with C++ smart pointers: - Optional — cannot be enforced project-wide without linting - Mixing raw pointers and smart pointers (necessary for C APIs) reintroduces all raw pointer hazards - Do not prevent buffer overflows in arrays - Do not prevent data races - The C++ type system cannot enforce "this raw pointer is always valid" — that's the programmer's invariant

Rust Ownership Model (2015 onwards)

Rust's ownership model addresses all 10 bug classes described above through a combination of: 1. Ownership and move semantics: Each value has exactly one owner; the owner dropping the value frees it. No manual free() means no double-free and no memory leaks from forgetting to free. 2. Borrow checker: Statically enforces: at any point, either many immutable references OR one mutable reference exists. No reference outlives the owning value (no dangling pointers). Enforced at compile time, zero runtime overhead. 3. No null by default: Rust has no null pointer. Option<T> represents optionality and must be explicitly matched. 4. Integer overflow safety: In debug mode, integer overflow panics. In release mode, explicit wrapping operations must be used (wrapping_add, checked_add). 5. Race freedom: The Send and Sync traits enforce which types can be shared across threads. The borrow checker's single-mutable-reference rule means mutable access to shared data requires explicit synchronization. 6. Array bounds checking: Array indexing in safe Rust always bounds-checks.

The result: in safe Rust, all 10 bug classes are compile errors or runtime panics — not undefined behavior. The bugs that were CVEs become build failures.

/* Use-after-free: compile error in Rust */
let v = vec![1, 2, 3];
let r = &v[0];      // borrow v
drop(v);            // try to free v
println!("{}", r);  // ERROR: v was moved while borrowed
// error[E0382]: borrow of moved value: `v`

/* Double-free: impossible in Rust */
let v = vec![1, 2, 3];
let v2 = v;         // ownership moves to v2
drop(v);            // ERROR: v was already moved
// error[E0382]: use of moved value: `v`

/* Buffer overflow: runtime panic (not UB) */
let arr = [1, 2, 3];
let x = arr[10];    // thread 'main' panicked at 'index out of bounds'
                    // Panic, not undefined behavior

ASCII Diagram: Memory Safety Bug Classes and Their Effects

Memory Safety Bug Map:

  Allocation        Access              Deallocation
  ──────────        ──────              ────────────
  malloc(n)    →  ptr[0..n-1]  →       free(ptr)
                      │                     │
                      │                     │
   Bugs:              │                     │
   Integer overflow   │                     Double-free
   (n wraps, too      │                     (free twice → heap corruption)
    small)            │
                      ├── Out of bounds (buffer overflow)
                      │   → adjacent memory corruption
                      │
                      ├── After free (use-after-free)
                      │   → another object's memory corrupted/read
                      │
                      ├── Uninitialized
                      │   → arbitrary data from previous use
                      │
                      ├── Type confusion
                      │   → wrong type interpretation of valid bytes
                      │
                      └── Data race
                          → interleaved read/write, undefined behavior

C/C++: All of the above are undefined behavior → attacker-exploitable
GC languages: Eliminate deallocation bugs, most access bugs (GC traces liveness)
Rust: All of the above are compile errors (safe Rust) or defined panics

Debugging Notes

# AddressSanitizer: detects heap/stack buffer overflow, UAF, use-after-return
gcc -fsanitize=address,undefined -g -O1 -o program program.c
./program

# Valgrind: detects memory leaks, UAF, uninitialized reads (slow: 10-50x)
valgrind --leak-check=full --track-origins=yes ./program

# ThreadSanitizer: detects data races
gcc -fsanitize=thread -g -O1 -o program program.c
./program

# KASAN (Kernel AddressSanitizer): kernel equivalent
# CONFIG_KASAN=y in kernel .config

# Rust: compile-time safety, no runtime overhead for most checks
cargo build  # borrow checker runs during compilation
cargo test   # bounds checks at runtime in test/debug builds

Security Implications

The security community consensus is now: 1. Memory safety bugs cannot be eliminated by code review alone 2. Mitigations (ASLR, canaries, CFI) raise the bar but do not eliminate exploitability 3. The only durable solution is memory-safe languages 4. This is the primary technical motivation for Rust, and for Rust's adoption in security-sensitive codebases (Android kernel, Linux kernel, Windows components, AWS Firecracker)

Performance Implications

  • C/C++ with all mitigations enabled: ~5-15% overhead vs no mitigations
  • Rust safe code: ~0-5% overhead vs equivalent C (bounds checks are often optimized away by the compiler with provable bounds)
  • GC languages: 5-30% CPU overhead, 2-5x memory overhead, GC pause latency

Future Directions

  • ARM MTE (Memory Tagging Extension): Hardware tags pointers with 4-bit tags; mismatched tag on access → hardware fault. Detects UAF and buffer overflow at near-zero overhead. Production in Pixel 6+ (2021).
  • CHERI (Capability Hardware Enhanced RISC Instructions): Hardware capability pointers with built-in bounds and permissions. Prototype in ARM Morello (2022).
  • Formally verified C: F* (Microsoft Research), CompCert — formally verified C compilers and libraries. Extreme cost, limited scope.
  • Memory-safe system languages beyond Rust: Carbon (Google, C++ successor attempt), Val (experimental value-semantics language), and others are in development.

Exercises

  1. Write a C program with a stack buffer overflow that overwrites the return address. Compile with gcc -fno-stack-protector -z execstack -no-pie. Use GDB to find the exact offset to the return address and redirect execution to a simple function.

  2. Write the same program in Rust. Observe that the equivalent operation is either a compile error (if using references) or a runtime panic (if using indexing). Confirm no undefined behavior occurs.

  3. Use AddressSanitizer to detect a use-after-free in a C program. Observe the ASan output: allocation site, free site, and use-after-free access site in the stack trace.

  4. Look up the Chrome Security team's annual vulnerability breakdowns (published as blog posts and conference talks). Chart the percentage of memory safety CVEs for Chrome from 2015-2023. What trend do you observe?

  5. Research CVE-2021-3156 (sudo "Baron Samedit"). Understand the \ escape in argv processing that caused a heap overflow. Why wasn't this caught in 10 years of use?

References

  • Miller, Matt. "Trends, Challenges, and Strategic Shifts in the Software Vulnerability Mitigation Landscape." BlueHat, 2019.
  • Google Project Zero. "0day 'In the Wild'" spreadsheet. github.com/0vercl0k/wtf/blob/main/pwn2own_2021.md (and related Project Zero publications)
  • Microsoft. "We need a safer systems programming language." Microsoft Security Response Center Blog, 2019.
  • NSA. "Software Memory Safety." Cybersecurity Information Sheet, November 2022.
  • Android Security Team. "Memory safety blog series." android-developers.googleblog.com, 2021-2022.
  • Serebryany, Kostya et al. "AddressSanitizer: A Fast Address Sanity Checker." USENIX ATC 2012.
  • ISO/IEC. "Programming Languages — C." C11 standard, Annex K (Bounds-checking interfaces), 2011.