Section 27: Kernel Exploits — Overview
Section Purpose and Scope
This section examines kernel exploitation as a discipline — not as a how-to guide for attack, but as the technical foundation required to understand mitigation design, evaluate CVE severity accurately, and build kernel code that is defensible. A systems architect who cannot reason about how kernel vulnerabilities are exploited cannot meaningfully evaluate the security properties of their design. This section covers exploit primitive classes, the methodology of turning a bug into privilege escalation, and the evolution of mitigations that shaped exploit development over two decades.
Prerequisites
- Section 03: Kernel Fundamentals (kernel data structures, slab allocator)
- Section 06: CPU Architecture (CPU rings, calling conventions, ISA)
- Section 07: Process Management (task_struct, credentials, /proc)
- Section 11: Memory Management (virtual address space, kernel heap layout)
- Section 26: Security (all mitigation mechanisms, SMEP/SMAP/KPTI/CFI)
- Section 24: Debugging (GDB, KASAN, crash analysis — exploit development tools)
Learning Objectives
- Classify a kernel vulnerability as a specific primitive (UAF, heap overflow, etc.) and describe what attacker capability it grants.
- Trace the path from a type confusion bug to controlled kernel code execution.
- Explain ret2usr and why SMEP made it obsolete.
- Construct the logical structure of a kernel ROP chain (without writing one).
- Explain KASLR bypass techniques (info leaks, side channels, timing).
- Analyze a container escape: which primitives are needed and which kernel protections block them.
- Describe the hypervisor escape threat model and why VM isolation is not equivalent to physical isolation.
- Read a CVE technical writeup and map it to the exploitation primitive and mitigation landscape at the time.
Architecture Overview
Kernel Exploit Development Lifecycle:
┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐
│ Bug Class │───►│ Primitive │───►│ Exploitation Goal │
│ (root cause)│ │ (what │ │ (privilege │
│ │ │ attacker │ │ escalation) │
│ heap OOB │ │ controls) │ │ │
│ UAF │ │ ----------- │ │ uid 0 / root │
│ race cond │ │ arbitrary │ │ container escape │
│ type confus │ │ write │ │ hypervisor escape │
│ stack OOB │ │ PC control │ │ info leak → KASLR │
│ int overflow│ │ info leak │ │ bypass → RWX page │
└──────────────┘ └──────────────┘ └──────────────────────┘
Privilege Escalation via commit_creds (classic technique):
┌───────────────────────────────────────────────────────────────┐
│ 1. Obtain kernel code execution or arbitrary write │
│ 2. Find init_cred / prepare_kernel_cred() address │
│ (via KASLR bypass — /proc/kallsyms if unprivileged │
│ is disabled, or via info leak exploit primitive) │
│ 3. Call commit_creds(prepare_kernel_cred(NULL)) │
│ → sets current->cred to all-zero uid/gid/capabilities │
│ 4. Return to user space with root credentials │
│ │
│ Modern alternative (CFI era): │
│ → Overwrite modprobe_path string │
│ → Trigger kernel to exec user-controlled binary as root │
└───────────────────────────────────────────────────────────────┘
SMEP/SMAP Effect on Exploit Flow:
┌───────────────────────────────────────────────────────────────┐
│ Pre-SMEP (ret2usr): │
│ kernel RIP → redirect to user shellcode → commit_creds() │
│ │
│ Post-SMEP (kernel ROP): │
│ kernel RIP → kernel ROP gadgets → pivot stack to kernel │
│ → chain: disable SMEP → (optional user code) │
│ → commit_creds / modprobe_path │
│ │
│ Post-SMEP/CFI (data-only attacks): │
│ No code execution needed → overwrite cred struct directly │
│ (if arbitrary write primitive available and cred located) │
└───────────────────────────────────────────────────────────────┘
Slab/SLUB Heap Exploitation:
┌───────────────────────────────────────────────────────────────┐
│ kmalloc-64 kmalloc-128 kmalloc-256 kmalloc-512 │
│ [obj][obj] [obj][obj] [obj][obj] [obj][obj] │
│ ↑ │
│ Target: place a freed chunk adjacent to a sensitive kernel │
│ object (e.g., tty_struct, msg_msg, pipe_buffer, sk_buff) │
│ Then overflow into it, or UAF to read/write it │
└───────────────────────────────────────────────────────────────┘
Key Concepts
- Heap Overflow: Write past the end of a heap allocation into adjacent memory. In the kernel slab/SLUB allocator, this can overwrite adjacent kernel objects. Exploitability depends on what object follows the overflowed buffer.
- Stack Overflow: Write past the end of a kernel stack frame into adjacent stack frame or guard area. Kernel stacks are typically 8KB (x86-64). Kernel stack cookies (CONFIG_STACKPROTECTOR) mitigate linear overflows.
- Use-After-Free (UAF): Access a pointer after the object it points to has been freed. The freed memory may be reallocated to a different object type (cross-cache attack) or to attacker-controlled data. One of the most common kernel CVE classes.
- Type Confusion: Treating memory of type A as type B. Often results from incorrect union handling, unsafe casts, or object lifecycle bugs. Can give attacker control over function pointer fields by placing the wrong object type in a location where its fields are dereferenced as function pointers.
- Race Condition (TOCTOU): Time-of-check to time-of-use. Attacker races between the check and the use to change the checked value. Classic example: Dirty COW (CVE-2016-5195) — race in copy-on-write handling of memory-mapped files.
- Integer Overflow: Arithmetic overflow in size calculations leads to under-allocated buffers followed by heap overflows. Common in
kmalloc(user_size * element_size)patterns. - Arbitrary Write Primitive: Attacker can write a controlled value to a controlled address. Typically achieved by chaining a UAF or overflow into a write gadget. Enables overwriting function pointers, credentials, or security flags.
- Info Leak Primitive: Attacker can read kernel memory they should not be able to read. Used to defeat KASLR (leak kernel text address), find target object addresses (heap address leak), or extract sensitive data.
- ret2usr: Classic pre-SMEP technique: redirect kernel RIP to user-space shellcode (which runs with kernel privileges). Defeated by SMEP (CPU prevents kernel executing user-space pages).
- Kernel ROP (Return-Oriented Programming): Chain of gadgets ending in
retinstructions within existing kernel.text. Each gadget performs a small operation. Used to disable SMEP (clear CR4 bit), set up a call to commit_creds, etc. Mitigated by CFI (restricts valid branch targets). - KASLR Bypass: KASLR randomizes kernel base address. Bypasses: info leaks in /proc (mitigated by kptr_restrict), timing side channels (KAISER partially addresses), CPU side channels (Spectre gadgets).
- SMEP Bypass: SMEP can be disabled by clearing bit 20 of CR4. A kernel ROP gadget sequence:
mov cr4, raxwithrax= CR4 & ~SMEP defeats SMEP. Mitigated by pinned CR4 bits (kernel prevents CR4.SMEP from being cleared since Linux 5.3). - Container Escape: Exploit a vulnerability to break out of container isolation into the host. Common paths: runc CVE-2019-5736 (overwrite runc binary), kernel UAF from inside namespace, excessive capability abuse.
- Hypervisor Escape (VM Escape): Exploit vulnerability in the hypervisor (KVM, QEMU device emulation) from inside a guest VM to gain code execution in the host kernel. Often via virtio driver emulation bugs (QEMU), VMEXIT handler bugs.
- Dirty COW (CVE-2016-5195): Race condition in Linux memory management (copy-on-write) allowing unprivileged user to write to read-only memory-mapped files. Used to overwrite SUID binaries. One of the most widely exploited Linux kernel vulnerabilities.
- Rowhammer: DRAM hardware attack. Rapidly reading from a DRAM row causes bit flips in adjacent rows. Used to flip page table entries (PTE) to gain kernel write access without exploiting a software bug. Mitigated by ECC, target row refresh, LPDDR5 mitigations.
Major Historical Milestones
| Year | Event |
|---|---|
| 2005 | Return-to-libc technique adapted to kernel exploitation |
| 2007 | ROP (Return-Oriented Programming) formally described (Shacham) |
| 2008 | SMEP support added to Intel processors; ret2usr made less viable |
| 2009 | Brad Spengler publishes kernel exploit methodology (grsecurity) |
| 2010 | Linux 2.6.37 NULL pointer dereference (CVE-2010-2240) widely exploited |
| 2012 | PF_RING slab exploitation technique documented |
| 2013 | KASLR merged into Linux kernel |
| 2015 | Rowhammer bit-flip exploitation first demonstrated (Google Project Zero) |
| 2016 | Dirty COW (CVE-2016-5195) — race condition, widely exploited in Android |
| 2017 | SLAB exploitation techniques (heap spray, cross-cache) systematized |
| 2017 | runc container escape (CVE-2019-5736 predecessor research) |
| 2018 | Meltdown/Spectre — microarchitectural info leaks; KASLR bypass possible |
| 2019 | runc CVE-2019-5736 container escape; MDS attacks (RIDL, ZombieLoad) |
| 2020 | BleedingTooth Bluetooth heap overflow (CVE-2020-12351); Kernel struct mitigations |
| 2021 | Sequoia (CVE-2021-33909) — size_t integer overflow in seq_file; syzbot finds hundreds of bugs |
| 2022 | DirtyPipe (CVE-2022-0847) — pipe splice race condition; Netfilter heap overflows |
| 2023 | StackRot (CVE-2023-3269) — maple tree UAF; use-after-free exploits via msg_msg |
| 2024 | xz-utils supply chain backdoor — shifts attention to supply chain exploitation |
Modern Relevance
Kernel exploit research directly drives defensive work: every major mitigation (SMEP, SMAP, KPTI, CFI, KASLR, slab hardening, Randomize Freelist) was developed in response to demonstrated exploit techniques. The current mitigation landscape has significantly raised the difficulty bar — reliably exploiting a modern, hardened kernel requires chaining multiple primitives across multiple vulnerabilities or bypassing mitigations through microarchitectural side channels.
The container security model is stress-tested by this knowledge: understanding which kernel vulnerabilities allow container escape determines how much trust to place in namespace isolation alone versus requiring Kata Containers or gVisor. The CVE severity scoring for kernel vulnerabilities is only meaningful if the evaluator understands the current mitigation state of the target kernel version.
Automated vulnerability discovery (syzbot + syzkaller + KASAN/KMSAN) has dramatically increased kernel bug discovery rates. The challenge has shifted from finding bugs to exploiting hardened targets.
File Map
27-kernel-exploits/
├── 00-overview.md ← this file
├── 01-exploit-classes.md ← UAF, heap/stack overflow, type confusion, race
├── 02-exploit-methodology.md ← bug → primitive → goal methodology
├── 03-ret2usr.md ← classic technique, why SMEP ended it
├── 04-rop-kernel.md ← kernel gadget chaining, SMEP bypass via CR4
├── 05-smep-smap-bypass.md ← CR4 pin, historical bypass techniques
├── 06-kaslr-bypass.md ← info leaks, side channels, kptr_restrict
├── 07-privilege-escalation.md ← commit_creds, modprobe_path, cred overwrite
├── 08-container-escapes.md ← namespace breakout, runc CVEs, techniques
├── 09-hypervisor-escapes.md ← QEMU device bugs, VM escape methodology
├── 10-dirty-cow.md ← CVE-2016-5195 deep dive and analysis
├── 11-rowhammer.md ← DRAM bit flips, PTE exploitation, mitigations
├── 12-slab-exploitation.md ← kmalloc, cross-cache, heap spray, msg_msg
├── 13-exploit-mitigations.md ← full mitigation evolution timeline
└── 14-cve-case-studies.md ← DirtyPipe, Sequoia, BleedingTooth analysis
Cross-References
- Section 03 (Kernel Fundamentals): Kernel data structures (task_struct, cred, tty_struct) that are exploit targets
- Section 06 (CPU Architecture): CPU ring model, calling conventions needed for ROP chain construction
- Section 11 (Memory Management): Slab/SLUB allocator — the kernel heap that exploit primitives operate on
- Section 24 (Debugging): KASAN/KMSAN find exploitable bugs; KGDB used in exploit development
- Section 26 (Security): Mitigations that exploit techniques were designed to bypass
- Section 20 (Containers): Container escape techniques require understanding both exploit and container security
- Section 43 (Formal Verification): Verified kernels (seL4) as the architectural response to exploitability