01 — Kernel Exploit Classes

Technical Overview

The Linux kernel is the most privileged software component on a system. A successful kernel exploit gives an attacker complete control: arbitrary code execution at ring 0, the ability to read or modify any process's memory, disable security mechanisms, and establish persistent rootkits. The attack surface is immense: millions of lines of C code processing untrusted input through system calls, network protocols, filesystem drivers, device drivers, and IPC mechanisms.

Understanding kernel exploit classes is foundational for both offense (finding and weaponizing vulnerabilities) and defense (designing mitigations, reviewing code for vulnerability patterns, and threat modeling). This document catalogs the major classes with technical detail, real CVEs, and root cause analysis.

Prerequisites

Linux virtual memory layout (kernel vs user address spaces)
Heap and stack memory management concepts
System call interface and privilege boundaries
Basic C memory semantics (pointers, structs, allocation)

Attack Surface Analysis

                    KERNEL ATTACK SURFACE
  ┌───────────────────────────────────────────────────────────────┐
  │                                                               │
  │  Syscall interface (~400 syscalls)                           │
  │  ├── open, read, write, ioctl, mmap, socket, ...             │
  │  └── Each validates/copies arguments from user space         │
  │                                                               │
  │  Network stack (TCP, UDP, ICMP, netlink, BPF, ...)           │
  │  ├── Reachable from network (remote attack surface)          │
  │  └── Reachable from user processes via socket syscalls       │
  │                                                               │
  │  Filesystem layer (VFS + individual fs drivers)              │
  │  File-system-level attacks: crafted FS images, FUSE          │
  │                                                               │
  │  Device drivers (PCI, USB, platform, ...)                    │
  │  USB drivers: physical access triggers kernel code           │
  │                                                               │
  │  IPC mechanisms (pipes, sockets, futexes, semaphores)        │
  │  Kernel modules (if loadable by attacker — rare)             │
  │  eBPF (verifier bugs allow malicious programs)               │
  │  Kernel cryptographic interfaces (af_alg)                    │
  └───────────────────────────────────────────────────────────────┘

Class 1: Heap Overflow

A heap overflow writes beyond the boundary of a heap-allocated buffer, corrupting adjacent allocations. In the kernel heap (SLUB/SLAB allocator), adjacent objects may be arbitrary kernel data structures.

Technical mechanism: The kernel uses SLUB as its primary slab allocator. Objects of similar sizes are grouped into slabs (pages). If a buffer overflow writes N extra bytes beyond an allocation, it may reach the next object in the same slab — or the slab freelist pointer at the end of the page.

SLUB slab layout (simplified):

  Page N (size-512 slab):
  ┌──────────────────┬──────────────────┬──────────────────┐
  │ obj[0]: 512 bytes│ obj[1]: 512 bytes│ obj[2]: 512 bytes│
  │  (our buffer)    │ (adjacent alloc) │  ...             │
  └──────────────────┴──────────────────┴──────────────────┘
           ↑                    ↑
    overflow from here   writes here → corrupts adjacent object

Exploitation goal: Place a vulnerable kernel structure in the adjacent object (e.g., struct file, struct cred, struct pipe_inode_info). Overflow into it to overwrite function pointers or security-relevant fields.

CVE-2021-3490 (eBPF ALU32 bounds tracking bypass): Integer overflow in ALU32 register tracking in the eBPF verifier allowed an out-of-bounds write to a kernel map value, leading to heap corruption. Root → local privilege escalation.

CVE-2022-0185 (Linux VFS legacy_parse_param): Heap overflow in filesystem parameter parsing. The function accepted a parameter name+value pair and copied it into a fixed-size heap buffer without sufficient length validation. Local privilege escalation from a user namespace.

CVE-2022-27666 (IPSec ESP transformation): Heap overflow in esp6_output_head(). The computed output size was 4 bytes smaller than the actual write due to integer math — writing 4 bytes past a heap buffer. net_device or other network structures placed adjacent could be corrupted.

Class 2: Stack Overflow / Stack-Based Buffer Overflow

Stack overflows in the kernel write beyond the end of a kernel stack frame, potentially overwriting saved registers, return addresses, or local variables of the calling function.

Why less common: Kernel stacks are small (8KB or 16KB with CONFIG_THREAD_INFO_ON_STACK). VMAP_STACK (Linux 4.9) allocates kernel stacks in vmalloc space with guard pages on both sides, turning most stack overflows into an immediate oops rather than silent corruption.

Remaining risk: Recursive calls with large local arrays can still overflow past guard pages. Unbounded recursion in filesystem drivers (processing crafted filesystem images) is a historical source.

CVE-2017-1000112 (UDP fragmentation offload): Kernel stack variable fragstolen was used without initialization. While not a classic overflow, it demonstrated the fragility of stack variable handling in network paths.

Class 3: Use-After-Free (UAF)

Use-after-free is the most prevalent class in modern kernel exploits. A dangling pointer to a freed object is used to read or write the reallocated memory.

UAF exploitation flow:

  Time 1: alloc obj_A (struct file, for example)
           victim_ptr = &obj_A

  Time 2: free(obj_A) → obj_A's memory returned to SLUB cache
           victim_ptr still points to obj_A's old address

  Time 3: alloc obj_B of same or similar size
           SLUB may reuse obj_A's page for obj_B
           *obj_B = attacker_data  (cross-cache spray)

  Time 4: victim_ptr->func_ptr()   ← calls into attacker data!

CVE-2021-4154 (cgroup1_parse_param): A UAF in the cgroup1 subsystem. cgroup_freezer_parse_prop() freed a string but retained a pointer to it. A subsequent reference used the dangling pointer. Exploitable from user namespaces.

CVE-2022-2588 (cls_route filter UAF): Route-based traffic classifier had a reference counting bug. route4_change() updated the filter but left a pointer to the old filter in the hash table after the old filter was freed. The old filter's kfree_rcu had a race with the new filter's setup. Full KASLR bypass + LPE in public exploit (Notselwyn's v3g4n exploit).

CVE-2023-32233 (Netfilter nftables use-after-free): In nf_tables_newrule(), an anonymous set was freed before processing completed when an error occurred mid-transaction. Race conditions in the batch commit path led to UAF, exploited as a full root exploit.

Key enabler: Linux's RCU (Read-Copy-Update) defers frees, creating a grace period window during which freed objects remain accessible. Bugs at the RCU boundary (freeing too early, using after grace period) are a rich source of UAF.

Class 4: Type Confusion

Type confusion occurs when a pointer to one struct type is cast (implicitly or explicitly) to a different type, and the code then accesses fields at wrong offsets.

/* Example of type confusion pattern (simplified) */
struct base_obj { int type; void *data; };
struct safe_obj { int type; void (*handler)(void *); int value; };
struct evil_obj { int type; long controlled[4]; };

/* Bug: only checks type==SAFE at allocation, not at use */
if (obj->type == TYPE_SAFE) {
    /* But if attacker substituted evil_obj in same allocation slot: */
    ((struct safe_obj *)obj)->handler(obj->data);  /* calls attacker's pointer */
}

CVE-2019-2215 (Android Binder driver UAF): A use-after-free in the Binder IPC driver allowed treating one binder_node object as another type. Exploited in the wild by the Pinchy Spider threat group; affected Android 8.x and 9.x devices. Marked as exploited in the wild by Google Project Zero.

CVE-2017-6074 (DCCP double-free / type confusion): In the DCCP (Datagram Congestion Control Protocol) socket code, a socket in certain state transitions could have its struct treated as different socket type, leading to a double-free and subsequent type confusion. Affected Linux 2.6.18+.

Class 5: Null Pointer Dereference

Dereferencing a null pointer in kernel space causes an oops at address 0x0. Historically, if user space could map the null page (mmap(0, ...)) and place shellcode there, a kernel null dereference would execute it at ring 0.

Historical exploit: Set mmap_min_addr = 0, mmap(0, 4096, ...), place shellcode, trigger kernel null dereference. Extremely reliable.

Modern mitigation: mmap_min_addr is now 65536 (0x10000) by default on x86-64. User space cannot map page 0. Kernel null dereferences are now denial-of-service rather than exploitable.

CVE-2009-1897 (TUN/TAP driver): A null pointer dereference in tun_chr_poll() — tun->sk was null if the file was opened but no interface attached. Pre-mmap_min_addr enforcement, this was exploited by mapping shellcode at address 0.

Class 6: Race Conditions (TOCTOU)

Race conditions occur when two threads access shared state concurrently without proper synchronization, or when a decision made in one time window is acted upon in another (TOCTOU: Time-Of-Check-To-Time-Of-Use).

Dirty COW (CVE-2016-5195): The most famous kernel race condition. A race between madvise(MADV_DONTNEED) and write() to /proc/self/mem on a read-only file mapping. Two threads race to make the write go to the original file rather than the CoW copy. Full analysis in 03-dirty-cow-analysis.md.

CVE-2017-2636 (n_hdlc serial discipline): A race between n_hdlc_release() and n_hdlc_tty_read() could cause a double-free of n_hdlc->tbuf. The race window was a few nanoseconds — but with Hyper-Threading the attacker controls both hardware threads and can widen the window with cache pressure.

CVE-2021-3600 (eBPF ALU32 narrowing race): A race in the eBPF verifier's handling of certain ALU32 operations — the verifier checked bounds under one assumption but the JIT compiled code under a different assumption due to concurrent modification.

Class 7: Integer Overflow Leading to Heap Under-Allocation

Integer overflow in size calculations causes kmalloc(size) to allocate a buffer smaller than the data that will be written into it, creating an implicit heap overflow.

/* Classic pattern */
int count = user_input;         /* attacker-controlled: 0x10000001 */
size_t size = count * sizeof(struct entry);  /* overflows to tiny value */
buf = kmalloc(size, GFP_KERNEL);            /* tiny buffer */
copy_from_user(buf, user_buf, count * sizeof(struct entry));  /* huge write */

CVE-2019-2025 (Android Binder alloc): Integer overflow in binder_alloc_mmap_handler(). vma->vm_end - vma->vm_start could overflow when manipulated by user space, leading to an allocation smaller than requested, followed by an out-of-bounds write.

CVE-2022-0185 also involves integer overflow: the length check used int arithmetic which overflowed for large inputs, bypassing the bounds check.

Class 8: Out-of-Bounds Read/Write

Array accesses without bounds validation at the index allow reading or writing arbitrary adjacent memory.

CVE-2016-10229 (UDP recvmsg OOB): A missing bounds check in udp_recvmsg() when handling MSG_PEEK with recv_flags & MSG_TRUNC set could read up to 65535 extra bytes from the kernel heap into user space — a massive information leak useful for defeating KASLR.

CVE-2021-22555 (Netfilter Xtables OOB): An off-by-one in xt_compat_target_from_user() — the size parameter was computed as one less than the actual written size. With careful heap grooming, the 2-byte write could hit a msg_msg object and allow full privilege escalation. A public exploit was released by Andy Nguyen (Google).

CVE-2023-6931 (perf_event OOB write): A bounds check bypass in perf_event_parse_addr_filter() allowed writing a kernel pointer to an out-of-bounds location, enabling KASLR defeat and privilege escalation. Exploitable from unprivileged user namespaces.

Attack Surface: eBPF Verifier Bugs

eBPF deserves special mention as a modern high-value attack surface. eBPF programs are submitted by unprivileged users (with CAP_BPF or via unprivileged sockets), and the verifier is supposed to prove they are safe before JIT-compiling and running them in the kernel.

eBPF attack flow:
  User submits BPF program
           │
           ▼
  BPF Verifier (kernel) — static analysis
  ├── Checks register bounds, pointer arithmetic
  ├── Ensures no out-of-bounds memory access
  ├── Ensures program terminates
  └── If verifier bug → unsafe program passes → kernel exploit
           │
           ▼
  BPF JIT compiler — machine code generation
           │
           ▼
  Program runs in kernel context (ring 0)

CVEs in the BPF verifier since 2020: CVE-2020-8835, CVE-2021-3490, CVE-2021-3489, CVE-2021-3491, CVE-2022-23222, CVE-2022-0500, CVE-2023-2163. The verifier complexity (30,000+ lines of C) combined with the high attacker-controlled input surface makes it consistently productive for security researchers.

Historical Context

Kernel exploitation has evolved through distinct eras:

Pre-2007: Simple techniques dominated. Null pointer dereferences were trivially exploitable (map page 0). Stack overflows were common and straightforward. ASLR did not apply to the kernel.

2007-2012: SMEP, SMAP, and kernel ASLR began appearing. The era of "mmap_min_addr bypass" and early kernel ROP chains. mmap_min_addr set to 0x10000 as a partial mitigation.

2012-2016: Refcount overflows, use-after-free, and race conditions dominated. Heap spray techniques matured. KASLR deployments began appearing. Dirty COW (CVE-2016-5195) was the signature vulnerability of this era.

2016-present: UAF in complex subsystems (networking, filesystems, eBPF), mitigations requiring multi-stage exploits (info leak + KASLR bypass + ROP + commit_creds). eBPF verifier bugs emerging as the newest rich attack surface.

Production Examples

Container escapes via kernel exploits: CVE-2022-0185 was weaponized to escape Kubernetes pods. An attacker with access to a container (via a web shell or RCE in a containerized app) exploited the kernel's heap overflow to gain root on the host node. A single compromised pod in a Kubernetes cluster could pivot to compromise all pods on the node.

Android privilege escalation: CVE-2019-2215 (Binder UAF) was used by the Triout spyware campaign. A malicious app on Android requested minimal permissions, then used the kernel exploit to gain root and install a persistent backdoor with access to microphone, camera, and SMS.

Debugging Notes

# Enable kernel address sanitizer (detects UAF, OOB at runtime)
# Requires CONFIG_KASAN=y in kernel build

# Run syzkaller (kernel fuzzer) to find new bugs
# https://github.com/google/syzkaller

# Check kernel for vulnerability patterns
grep -r "copy_from_user" drivers/ | grep -v "if.*copy_from_user"

# Monitor kernel exploit attempts (auditd)
auditctl -a always,exit -F arch=b64 -S init_module
auditctl -a always,exit -F arch=b64 -S finit_module

# Check for suspicious kernel module loads
dmesg | grep "module loaded"

Security Implications

The consequences of a kernel exploit are total. Unlike userspace exploits where an attacker gains the privileges of one process, a kernel exploit gives: - The ability to read every process's memory (including HSM-stored keys in memory, TLS session keys, SSH private keys in ssh-agent) - The ability to disable SELinux/AppArmor/seccomp - The ability to modify the system call table (rootkit) - Persistent presence that survives process kills but not reboots (unless combined with bootkit)

Kernel exploits are the most valuable class in vulnerability markets. A reliable, public exploit for a mainline kernel vulnerability is a critical P0 security event for any organization running Linux.

Performance Implications

Most kernel vulnerabilities arise in error paths or unusual code paths rather than the hot path — because hot paths receive more review. This means the vulnerable code is often rarely exercised in normal operation, making detection by anomaly detection difficult.

Failure Modes

Kernel exploits often crash the system: - Kernel oops: The exploit triggered an invalid memory access before gaining control. The kernel logs a stack trace and the process is killed, but the system survives. - Kernel panic: More severe — the system halts. This is detectable via system logs and crash dumps. Enable kdump to capture crash dumps for analysis. - Silent corruption: The exploit succeeded in modifying kernel data without triggering a crash. Hardest to detect.

Modern Usage

Kernel exploits remain active in 2025. CVE-2024-1086 (netfilter nft_verdict_init UAF) was exploited in the wild. CVE-2024-26581 (netfilter set policy bypass) received a CVSS 7.8. Active exploit development continues against eBPF, io_uring, and the networking subsystem.

Future Directions

Kernel CFI (Control Flow Integrity): Google's ClangCFI-based kernel CFI (merged for ARM64 in 6.1, x86 in progress) validates indirect call targets. This raises the exploitation cost for function pointer overwrites significantly.

Hardware memory tagging (ARM MTE, Intel MPX successor): Tags each pointer and each allocation with a color. Any pointer-to-allocation color mismatch causes a fault — defeating UAF and heap overflow exploits without software overhead.

Exercises

Use syzkaller against a QEMU kernel with KASAN enabled. Find a reproducible kernel crash. Analyze the KASAN report to determine the exploit class.
Write a proof-of-concept for a null pointer dereference (with mmap_min_addr=0 set in a test environment). Document what mitigation prevents this on production systems.
Compile a kernel with CONFIG_KASAN=y and CONFIG_LOCKDEP=y. Deliberately write a use-after-free in a test module. Observe the KASAN report.
Analyze CVE-2021-22555 (Netfilter off-by-one). Read the original exploit code. Identify: (a) the exact OOB write offset, (b) the target structure used for heap grooming, (c) how commit_creds was called.
Map every major kernel exploit from 2018-2025 onto the taxonomy in this document. Which class is most represented? Which subsystem has the most bugs?

References

Linux kernel CVE database: https://www.cve.org/CVESearch?searchTerm=linux+kernel
Project Zero blog: https://googleprojectzero.blogspot.com/
CVE-2021-22555 exploit: https://github.com/google/security-research/tree/main/pocs/linux/cve-2021-22555
Pawel Wieczorkiewicz, "Linux Kernel Heap Spray Techniques" — 2021
Jann Horn, "A collection of Linux kernel exploitation techniques" — Google Project Zero
"The Art of Exploitation" — kernel chapter, Jon Erickson
Syzkaller: https://github.com/google/syzkaller
Linux kernel security mailing list: https://lore.kernel.org/linux-security-module/