Section 27: Kernel Exploits — Overview

Section Purpose and Scope

This section examines kernel exploitation as a discipline — not as a how-to guide for attack, but as the technical foundation required to understand mitigation design, evaluate CVE severity accurately, and build kernel code that is defensible. A systems architect who cannot reason about how kernel vulnerabilities are exploited cannot meaningfully evaluate the security properties of their design. This section covers exploit primitive classes, the methodology of turning a bug into privilege escalation, and the evolution of mitigations that shaped exploit development over two decades.

Prerequisites

Section 03: Kernel Fundamentals (kernel data structures, slab allocator)
Section 06: CPU Architecture (CPU rings, calling conventions, ISA)
Section 07: Process Management (task_struct, credentials, /proc)
Section 11: Memory Management (virtual address space, kernel heap layout)
Section 26: Security (all mitigation mechanisms, SMEP/SMAP/KPTI/CFI)
Section 24: Debugging (GDB, KASAN, crash analysis — exploit development tools)

Learning Objectives

Classify a kernel vulnerability as a specific primitive (UAF, heap overflow, etc.) and describe what attacker capability it grants.
Trace the path from a type confusion bug to controlled kernel code execution.
Explain ret2usr and why SMEP made it obsolete.
Construct the logical structure of a kernel ROP chain (without writing one).
Explain KASLR bypass techniques (info leaks, side channels, timing).
Analyze a container escape: which primitives are needed and which kernel protections block them.
Describe the hypervisor escape threat model and why VM isolation is not equivalent to physical isolation.
Read a CVE technical writeup and map it to the exploitation primitive and mitigation landscape at the time.

Architecture Overview

  Kernel Exploit Development Lifecycle:

  ┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐
  │  Bug Class   │───►│  Primitive   │───►│  Exploitation Goal   │
  │  (root cause)│    │  (what       │    │  (privilege          │
  │              │    │  attacker    │    │  escalation)         │
  │  heap OOB    │    │  controls)   │    │                      │
  │  UAF         │    │  ----------- │    │  uid 0 / root        │
  │  race cond   │    │  arbitrary   │    │  container escape    │
  │  type confus │    │  write       │    │  hypervisor escape   │
  │  stack OOB   │    │  PC control  │    │  info leak → KASLR   │
  │  int overflow│    │  info leak   │    │  bypass → RWX page   │
  └──────────────┘    └──────────────┘    └──────────────────────┘

  Privilege Escalation via commit_creds (classic technique):
  ┌───────────────────────────────────────────────────────────────┐
  │  1. Obtain kernel code execution or arbitrary write           │
  │  2. Find init_cred / prepare_kernel_cred() address           │
  │     (via KASLR bypass — /proc/kallsyms if unprivileged       │
  │      is disabled, or via info leak exploit primitive)         │
  │  3. Call commit_creds(prepare_kernel_cred(NULL))             │
  │     → sets current->cred to all-zero uid/gid/capabilities    │
  │  4. Return to user space with root credentials               │
  │                                                               │
  │  Modern alternative (CFI era):                               │
  │  → Overwrite modprobe_path string                            │
  │  → Trigger kernel to exec user-controlled binary as root      │
  └───────────────────────────────────────────────────────────────┘

  SMEP/SMAP Effect on Exploit Flow:
  ┌───────────────────────────────────────────────────────────────┐
  │  Pre-SMEP (ret2usr):                                          │
  │  kernel RIP → redirect to user shellcode → commit_creds()    │
  │                                                               │
  │  Post-SMEP (kernel ROP):                                      │
  │  kernel RIP → kernel ROP gadgets → pivot stack to kernel      │
  │              → chain: disable SMEP → (optional user code)    │
  │              → commit_creds / modprobe_path                  │
  │                                                               │
  │  Post-SMEP/CFI (data-only attacks):                          │
  │  No code execution needed → overwrite cred struct directly   │
  │  (if arbitrary write primitive available and cred located)   │
  └───────────────────────────────────────────────────────────────┘

  Slab/SLUB Heap Exploitation:
  ┌───────────────────────────────────────────────────────────────┐
  │  kmalloc-64   kmalloc-128   kmalloc-256   kmalloc-512        │
  │  [obj][obj]   [obj][obj]    [obj][obj]    [obj][obj]         │
  │               ↑                                               │
  │  Target: place a freed chunk adjacent to a sensitive kernel  │
  │  object (e.g., tty_struct, msg_msg, pipe_buffer, sk_buff)    │
  │  Then overflow into it, or UAF to read/write it              │
  └───────────────────────────────────────────────────────────────┘

Key Concepts

Heap Overflow: Write past the end of a heap allocation into adjacent memory. In the kernel slab/SLUB allocator, this can overwrite adjacent kernel objects. Exploitability depends on what object follows the overflowed buffer.
Stack Overflow: Write past the end of a kernel stack frame into adjacent stack frame or guard area. Kernel stacks are typically 8KB (x86-64). Kernel stack cookies (CONFIG_STACKPROTECTOR) mitigate linear overflows.
Use-After-Free (UAF): Access a pointer after the object it points to has been freed. The freed memory may be reallocated to a different object type (cross-cache attack) or to attacker-controlled data. One of the most common kernel CVE classes.
Type Confusion: Treating memory of type A as type B. Often results from incorrect union handling, unsafe casts, or object lifecycle bugs. Can give attacker control over function pointer fields by placing the wrong object type in a location where its fields are dereferenced as function pointers.
Race Condition (TOCTOU): Time-of-check to time-of-use. Attacker races between the check and the use to change the checked value. Classic example: Dirty COW (CVE-2016-5195) — race in copy-on-write handling of memory-mapped files.
Integer Overflow: Arithmetic overflow in size calculations leads to under-allocated buffers followed by heap overflows. Common in kmalloc(user_size * element_size) patterns.
Arbitrary Write Primitive: Attacker can write a controlled value to a controlled address. Typically achieved by chaining a UAF or overflow into a write gadget. Enables overwriting function pointers, credentials, or security flags.
Info Leak Primitive: Attacker can read kernel memory they should not be able to read. Used to defeat KASLR (leak kernel text address), find target object addresses (heap address leak), or extract sensitive data.
ret2usr: Classic pre-SMEP technique: redirect kernel RIP to user-space shellcode (which runs with kernel privileges). Defeated by SMEP (CPU prevents kernel executing user-space pages).
Kernel ROP (Return-Oriented Programming): Chain of gadgets ending in ret instructions within existing kernel .text. Each gadget performs a small operation. Used to disable SMEP (clear CR4 bit), set up a call to commit_creds, etc. Mitigated by CFI (restricts valid branch targets).
KASLR Bypass: KASLR randomizes kernel base address. Bypasses: info leaks in /proc (mitigated by kptr_restrict), timing side channels (KAISER partially addresses), CPU side channels (Spectre gadgets).
SMEP Bypass: SMEP can be disabled by clearing bit 20 of CR4. A kernel ROP gadget sequence: mov cr4, rax with rax = CR4 & ~SMEP defeats SMEP. Mitigated by pinned CR4 bits (kernel prevents CR4.SMEP from being cleared since Linux 5.3).
Container Escape: Exploit a vulnerability to break out of container isolation into the host. Common paths: runc CVE-2019-5736 (overwrite runc binary), kernel UAF from inside namespace, excessive capability abuse.
Hypervisor Escape (VM Escape): Exploit vulnerability in the hypervisor (KVM, QEMU device emulation) from inside a guest VM to gain code execution in the host kernel. Often via virtio driver emulation bugs (QEMU), VMEXIT handler bugs.
Dirty COW (CVE-2016-5195): Race condition in Linux memory management (copy-on-write) allowing unprivileged user to write to read-only memory-mapped files. Used to overwrite SUID binaries. One of the most widely exploited Linux kernel vulnerabilities.
Rowhammer: DRAM hardware attack. Rapidly reading from a DRAM row causes bit flips in adjacent rows. Used to flip page table entries (PTE) to gain kernel write access without exploiting a software bug. Mitigated by ECC, target row refresh, LPDDR5 mitigations.

Major Historical Milestones

Year	Event
2005	Return-to-libc technique adapted to kernel exploitation
2007	ROP (Return-Oriented Programming) formally described (Shacham)
2008	SMEP support added to Intel processors; ret2usr made less viable
2009	Brad Spengler publishes kernel exploit methodology (grsecurity)
2010	Linux 2.6.37 NULL pointer dereference (CVE-2010-2240) widely exploited
2012	PF_RING slab exploitation technique documented
2013	KASLR merged into Linux kernel
2015	Rowhammer bit-flip exploitation first demonstrated (Google Project Zero)
2016	Dirty COW (CVE-2016-5195) — race condition, widely exploited in Android
2017	SLAB exploitation techniques (heap spray, cross-cache) systematized
2017	runc container escape (CVE-2019-5736 predecessor research)
2018	Meltdown/Spectre — microarchitectural info leaks; KASLR bypass possible
2019	runc CVE-2019-5736 container escape; MDS attacks (RIDL, ZombieLoad)
2020	BleedingTooth Bluetooth heap overflow (CVE-2020-12351); Kernel struct mitigations
2021	Sequoia (CVE-2021-33909) — size_t integer overflow in seq_file; syzbot finds hundreds of bugs
2022	DirtyPipe (CVE-2022-0847) — pipe splice race condition; Netfilter heap overflows
2023	StackRot (CVE-2023-3269) — maple tree UAF; use-after-free exploits via msg_msg
2024	xz-utils supply chain backdoor — shifts attention to supply chain exploitation

Modern Relevance

Kernel exploit research directly drives defensive work: every major mitigation (SMEP, SMAP, KPTI, CFI, KASLR, slab hardening, Randomize Freelist) was developed in response to demonstrated exploit techniques. The current mitigation landscape has significantly raised the difficulty bar — reliably exploiting a modern, hardened kernel requires chaining multiple primitives across multiple vulnerabilities or bypassing mitigations through microarchitectural side channels.

The container security model is stress-tested by this knowledge: understanding which kernel vulnerabilities allow container escape determines how much trust to place in namespace isolation alone versus requiring Kata Containers or gVisor. The CVE severity scoring for kernel vulnerabilities is only meaningful if the evaluator understands the current mitigation state of the target kernel version.

Automated vulnerability discovery (syzbot + syzkaller + KASAN/KMSAN) has dramatically increased kernel bug discovery rates. The challenge has shifted from finding bugs to exploiting hardened targets.

File Map

27-kernel-exploits/
├── 00-overview.md                  ← this file
├── 01-exploit-classes.md           ← UAF, heap/stack overflow, type confusion, race
├── 02-exploit-methodology.md       ← bug → primitive → goal methodology
├── 03-ret2usr.md                   ← classic technique, why SMEP ended it
├── 04-rop-kernel.md                ← kernel gadget chaining, SMEP bypass via CR4
├── 05-smep-smap-bypass.md          ← CR4 pin, historical bypass techniques
├── 06-kaslr-bypass.md              ← info leaks, side channels, kptr_restrict
├── 07-privilege-escalation.md      ← commit_creds, modprobe_path, cred overwrite
├── 08-container-escapes.md         ← namespace breakout, runc CVEs, techniques
├── 09-hypervisor-escapes.md        ← QEMU device bugs, VM escape methodology
├── 10-dirty-cow.md                 ← CVE-2016-5195 deep dive and analysis
├── 11-rowhammer.md                 ← DRAM bit flips, PTE exploitation, mitigations
├── 12-slab-exploitation.md         ← kmalloc, cross-cache, heap spray, msg_msg
├── 13-exploit-mitigations.md       ← full mitigation evolution timeline
└── 14-cve-case-studies.md          ← DirtyPipe, Sequoia, BleedingTooth analysis

Cross-References

Section 03 (Kernel Fundamentals): Kernel data structures (task_struct, cred, tty_struct) that are exploit targets
Section 06 (CPU Architecture): CPU ring model, calling conventions needed for ROP chain construction
Section 11 (Memory Management): Slab/SLUB allocator — the kernel heap that exploit primitives operate on
Section 24 (Debugging): KASAN/KMSAN find exploitable bugs; KGDB used in exploit development
Section 26 (Security): Mitigations that exploit techniques were designed to bypass
Section 20 (Containers): Container escape techniques require understanding both exploit and container security
Section 43 (Formal Verification): Verified kernels (seL4) as the architectural response to exploitability