Kernel Debugging Labs
Five hands-on labs covering kernel panic analysis, memory safety bugs, deadlock detection, function tracing, and eBPF-based dynamic analysis. Each lab creates a deliberate bug, then walks through diagnosing it with the correct tool.
Environment Requirements:
- QEMU x86_64 VM with a debug kernel build (see Lab 1 setup)
- Host tools: GDB 9+, QEMU 6+, bpftrace 0.14+
- Kernel config: CONFIG_DEBUG_INFO=y, CONFIG_FRAME_POINTER=y, CONFIG_KALLSYMS=y
Lab 1: Reproduce and Analyze a Kernel Panic with QEMU + GDB
Objective: Set up a kernel debugging environment, trigger a NULL pointer dereference, read the oops report, and decode the stack trace back to source.
Part A: Build a Debug Kernel
# Clone kernel
git clone --depth=1 https://github.com/torvalds/linux.git
cd linux
# Start from a minimal config
make x86_64_defconfig
# Add debug options
cat >> .config << 'EOF'
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_FRAME_POINTER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_GDB_SCRIPTS=y
CONFIG_MAGIC_SYSRQ=y
EOF
make olddefconfig
make -j$(nproc) 2>&1 | tee build.log
Part B: Create a Kernel That Panics
# Create a minimal initramfs with a busybox shell
# (simplest approach: use buildroot or download a pre-built rootfs)
# Boot with GDB stub enabled (-s = listen on :1234, -S = freeze at start)
qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage \
-initrd rootfs.cpio.gz \
-append "console=ttyS0,115200 nokaslr kgdboc=ttyS0 kgdbwait" \
-nographic \
-s -S \
-m 512M
Part C: Trigger a NULL Dereference
Write a module that deliberately dereferences NULL:
// null_deref.c
#include <linux/module.h>
#include <linux/kernel.h>
MODULE_LICENSE("GPL");
static int __init null_init(void)
{
int *ptr = NULL;
pr_info("null_deref: about to dereference NULL...\n");
*ptr = 42; /* This triggers a kernel oops / panic */
return 0;
}
static void __exit null_exit(void) {}
module_init(null_init);
module_exit(null_exit);
sudo insmod null_deref.ko
Part D: Read the Oops
A kernel oops looks like this (abbreviated):
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
CPU: 0 PID: 1234 Comm: insmod Tainted: G O 5.15.0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:null_init+0x1a/0x30 [null_deref]
Code: ...
RSP: 0018:ffffc9000024be98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88800f77c1e8 RDI: ffff88800f77c1e8
RBP: ffffc9000024bea8 R8: 0000000000000000 R9: 0000000000000000
R10: ffff88800f77c1e8 R11: 0000000000000000 R12: ffff88800f5a9600
R13: ffffc9000024bf58 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f8b2c5b3740(0000) GS:ffff88803bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000e10c000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_one_initcall+0x46/0x1c0
do_init_module+0x4e/0x220
load_module+0x2c73/0x3030
__do_sys_finit_module+0xa5/0x110
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Part E: Decode the Oops
Reading the oops — field by field:
| Field | Meaning |
|---|---|
BUG: kernel NULL pointer dereference |
Type of fault — NULL deref |
address: 0x0000000000000000 |
The faulting virtual address — NULL |
Oops: 0002 [#1] |
Error code: bit 1 = write, not read; [#1] = first oops |
RIP: 0010:null_init+0x1a/0x30 |
Faulting instruction: in null_init, offset 0x1a from start, function is 0x30 bytes |
CR2 |
Page fault address — always the bad pointer for null deref |
Call Trace |
Stack frames from crash to bottom |
Decode with GDB:
# In host terminal
gdb vmlinux
(gdb) target remote :1234
(gdb) list *(null_init+0x1a) # show source line
(gdb) disassemble null_init # see assembly, find offset 0x1a
With addr2line:
# Get the module .ko file with debug symbols
addr2line -e null_deref.ko 0x1a
# Output: null_deref.c:12 — the exact line
Analysis Questions
- What is the difference between an oops and a panic? (Hint:
panic_on_oopskernel parameter) - Why does
RIP: null_init+0x1anot show a source file directly — what tool bridges that gap? - What does
Tainted: G Omean in the oops header? (G = GPL module loaded, O = out-of-tree module)
Lab 2: Use KASAN to Detect Use-After-Free
Objective: Enable KASAN, write a module with a use-after-free bug, interpret the KASAN report.
Part A: Kernel Configuration
# Add to .config
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y
# Rebuild kernel
KASAN adds shadow memory (1/8 of addressable memory) that tracks every allocation. Accessing freed memory triggers an immediate, precise report.
Part B: Vulnerable Module
// uaf.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>
MODULE_LICENSE("GPL");
struct secret {
int value;
char name[64];
};
static int __init uaf_init(void)
{
struct secret *s;
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (!s)
return -ENOMEM;
s->value = 42;
strncpy(s->name, "kernel secret", sizeof(s->name));
pr_info("uaf: allocated at %px, value=%d\n", s, s->value);
kfree(s); /* s is freed here */
/* UAF: read from freed memory */
pr_info("uaf: reading freed memory: value=%d\n", s->value);
/* UAF: write to freed memory */
s->value = 99; /* KASAN will catch this write */
pr_info("uaf: wrote to freed memory\n");
return 0;
}
static void __exit uaf_exit(void) {}
module_init(uaf_init);
module_exit(uaf_exit);
Part C: KASAN Report
When sudo insmod uaf.ko is run, dmesg shows:
==================================================================
BUG: KASAN: use-after-free in uaf_init+0x7a/0x90 [uaf]
Write of size 4 at addr ffff888011b7e000 by task insmod/1234
CPU: 0 PID: 1234 Comm: insmod Tainted: G O 5.15.0+
Hardware name: QEMU Standard PC
Call trace:
uaf_init+0x7a/0x90 [uaf]
do_one_initcall+0x46/0x1c0
...
Allocated by task 1234:
uaf_init+0x2c/0x90 [uaf] ← allocation site
...
Freed by task 1234:
uaf_init+0x58/0x90 [uaf] ← free site
...
The buggy address ffff888011b7e000 belongs to the object at ffff888011b7e000
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of 128-byte region [ffff888011b7e000, ffff888011b7e080)
==================================================================
Part D: Interpret the Report
| Section | What it tells you |
|---|---|
Write of size 4 at addr ... |
Exact address and size of the bad access |
Call trace |
Stack at time of bug — points to uaf_init+0x7a |
Allocated by task |
Where the memory was originally allocated |
Freed by task |
Where the memory was freed (the kfree) |
kmalloc-128 of size 128 |
Which slab cache owned this memory |
With GDB:
gdb vmlinux
(gdb) list *(uaf_init+0x7a) # pinpoint the write line
Analysis Questions
- Why does KASAN catch this immediately rather than letting it silently corrupt memory?
- What is the shadow memory ratio KASAN uses, and what does it cost in memory overhead?
- What is the difference between KASAN, KFENCE, and KMSAN? (Hint: overhead vs. coverage)
Lab 3: lockdep Deadlock Detection
Objective: Write a module with an ABBA deadlock, observe lockdep's dependency graph output, and understand the lock ordering rules.
Part A: Kernel Configuration
CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_LOCK_STAT=y
Lockdep instruments every lock acquisition/release and builds a dependency graph. If acquiring lock B while holding lock A, and elsewhere A is acquired while holding B, lockdep reports this immediately — before a real deadlock occurs.
Part B: Deadlock Module
// deadlock.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/spinlock.h>
#include <linux/kthread.h>
#include <linux/delay.h>
MODULE_LICENSE("GPL");
static DEFINE_SPINLOCK(lock_a);
static DEFINE_SPINLOCK(lock_b);
/* Thread 1: acquires A then B */
static int thread1_fn(void *data)
{
pr_info("deadlock: thread1 acquiring A...\n");
spin_lock(&lock_a);
mdelay(10); /* give thread2 time to grab lock_b */
pr_info("deadlock: thread1 acquiring B...\n");
spin_lock(&lock_b); /* DEADLOCK: thread2 holds B, wants A */
spin_unlock(&lock_b);
spin_unlock(&lock_a);
return 0;
}
/* Thread 2: acquires B then A (opposite order — ABBA deadlock) */
static int thread2_fn(void *data)
{
pr_info("deadlock: thread2 acquiring B...\n");
spin_lock(&lock_b);
mdelay(10);
pr_info("deadlock: thread2 acquiring A...\n");
spin_lock(&lock_a); /* DEADLOCK: thread1 holds A, wants B */
spin_unlock(&lock_a);
spin_unlock(&lock_b);
return 0;
}
static struct task_struct *t1, *t2;
static int __init deadlock_init(void)
{
pr_info("deadlock: starting threads\n");
t1 = kthread_run(thread1_fn, NULL, "deadlock_t1");
t2 = kthread_run(thread2_fn, NULL, "deadlock_t2");
return 0;
}
static void __exit deadlock_exit(void)
{
pr_info("deadlock: module exit\n");
}
module_init(deadlock_init);
module_exit(deadlock_exit);
Part C: lockdep Output
======================================================
WARNING: possible circular locking dependency detected
5.15.0 #1 Tainted: G O
------------------------------------------------------
deadlock_t2/1236 is trying to acquire lock:
ffff888012345678 (lock_a){....}-{2:2}, at: thread2_fn+0x4e/0x70 [deadlock]
but task is already holding lock:
ffff88801234abcd (lock_b){....}-{2:2}, at: thread2_fn+0x2c/0x70 [deadlock]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (lock_b){....}-{2:2}:
thread1_fn+0x4c/0x70 [deadlock] ← thread1 grabbed B while holding A
...
-> #0 (lock_a){....}-{2:2}:
thread1_fn+0x2a/0x70 [deadlock]
...
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(lock_a);
lock(lock_b);
lock(lock_a); ← would deadlock here
lock(lock_b); ← would deadlock here
*** DEADLOCK ***
Part D: Understand the Dependency Graph
| Concept | Meaning |
|---|---|
| Lock class | All instances of the same lock (same lockdep_map) are one "class" |
| Lock dependency | "A → B" means "A was held when B was acquired" |
| Cycle in graph | Potential deadlock — lockdep reports immediately |
{....}-{2:2} |
Lock type annotation: spinlock, 2 contexts |
Fix: Establish a global lock ordering. Always acquire lock_a before lock_b everywhere.
Analysis Questions
- lockdep found this without a real deadlock occurring. How? What data structure does it maintain?
- What does
{....}mean in the lockdep annotation? ReadDocumentation/locking/lockdep-design.rst - How would you fix this deadlock? What design principle does the fix represent?
Lab 4: ftrace Function Graph Tracing
Objective: Use ftrace to trace kernel function calls, measure function latency, and use the irqsoff latency tracer.
Part A: Basic Function Tracing
# Mount debugfs if not mounted
mount -t debugfs none /sys/kernel/debug
cd /sys/kernel/tracing
# Trace a specific function
echo do_sys_open > set_ftrace_filter
echo function > current_tracer
echo 1 > tracing_on
cat /proc/version # trigger some open() calls
echo 0 > tracing_on
cat trace | head -40
Expected output:
# tracer: function
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
cat-1234 [000] .... 123.456789: do_sys_open <-__x64_sys_openat
Part B: Function Graph Tracer
echo function_graph > current_tracer
echo do_sys_open > set_graph_function
echo 1 > tracing_on
cat /proc/cpuinfo
echo 0 > tracing_on
cat trace | head -60
Expected output (call tree with timing):
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | do_sys_open() {
0) | getname() {
0) 1.234 us | kmem_cache_alloc();
0) 0.456 us | strncpy_from_user();
0) 3.891 us | }
0) | get_unused_fd_flags() {
0) 0.234 us | __alloc_fd();
0) 1.560 us | }
0) 23.456 us | }
Part C: Latency Tracers
# irqsoff: measure maximum IRQ-disabled latency
echo irqsoff > current_tracer
echo 1 > tracing_on
# Run a workload for 10 seconds
sleep 10
echo 0 > tracing_on
cat trace # shows the worst-case IRQ-off window
What to look for in irqsoff output:
- Maximum latency shown at the top: # irqsoff latency trace v1.1.5 on 5.15.0
- The exact function where IRQs were disabled and re-enabled
- Any surprise: "why is function X disabling IRQs for 200µs?"
# preemptoff: measure maximum preemption-disabled latency
echo preemptoff > current_tracer
Part D: Trace a Custom Module Function
# Trace only your module's function
echo 'chardev_write' > set_ftrace_filter
echo function > current_tracer
echo 1 > tracing_on
echo "test" > /dev/chardev # from Lab 2
echo 0 > tracing_on
cat trace
Analysis Questions
- What is the difference between
functionandfunction_graphtracers? Which has higher overhead? - The
irqsofftracer showed a 50µs IRQ-disabled window inspin_lock_irqsave. What does that imply for a real-time system? - Use
available_tracersto list all tracer types. What doesblktrace?
Lab 5: bpftrace One-Liners and Scripts
Objective: Write bpftrace programs to trace system calls, measure latency distributions, and analyze off-CPU behavior.
Prerequisites
sudo apt install bpftrace linux-headers-$(uname -r)
# Verify
sudo bpftrace -e 'BEGIN { printf("bpftrace works\n"); }'
Part A: Trace execve() Calls
Goal: Print every process exec with its full argument string.
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_execve
{
printf("%-6d %-16s %s\n", pid, comm, str(args->filename));
}
'
Expected output (while running commands in another terminal):
1234 bash /usr/bin/ls
1235 bash /usr/bin/cat
1236 bash /bin/ps
Analysis questions:
- What is the overhead of this tracepoint? Use perf stat to measure before/after.
- How would you filter to only show execve from processes owned by UID 1000?
Part B: Syscall Latency Histogram
Goal: Measure the distribution of read() latency for a target process.
# First, find the PID of interest (e.g., nginx)
PID=$(pgrep nginx | head -1)
sudo bpftrace -e "
tracepoint:syscalls:sys_enter_read
/pid == $PID/
{
@start[tid] = nsecs;
}
tracepoint:syscalls:sys_exit_read
/pid == $PID && @start[tid]/
{
@lat_us = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}
" &
# Generate load
wrk -t 4 -c 50 -d 30s http://localhost/
# After 30 seconds, kill bpftrace (Ctrl+C) — it prints the histogram
Expected output:
@lat_us:
[0] 124 |@@@@@@@@ |
[1] 1482 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4) 891 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 234 |@@@@@@@@ |
[8, 16) 67 |@@ |
[16, 32) 12 | |
[32, 64) 3 | |
What to look for: - Bimodal distribution? Suggests cache hit vs. miss - Long tail (>1ms for reads)? Suggests I/O wait or lock contention - Most reads complete in [1, 2) µs? Normal for cached filesystem reads
Part C: Off-CPU Analysis
Goal: Find what kernel stacks a process is blocking on (waiting for, not burning CPU).
# offcputime shows time spent off-CPU, stacked by kernel call path
sudo /usr/share/bcc/tools/offcputime -p $(pgrep myserver) -K 30
# Or with bpftrace (simplified version)
sudo bpftrace -e '
software:cpu-clock:100
/pid == target_pid/
{
@stacks[kstack, ustack] = count();
}
END { print(@stacks); }
' -DTARGET_PID=$(pgrep myserver)
Interpreting the output:
- Stack dominated by futex_wait → mutex/condition variable contention
- Stack dominated by ep_poll / epoll_wait → normal I/O wait (not a problem)
- Stack dominated by schedule → preempted, potentially too many threads
- Stack in blkdev_issue_flush → disk I/O wait
Part D: Advanced Script — Per-Process Network Bytes
sudo bpftrace -e '
kprobe:tcp_sendmsg
{
@send_bytes[pid, comm] = sum(arg2);
}
kprobe:tcp_recvmsg
{
@recv_bytes[pid, comm] = sum(arg2);
}
interval:s:5
{
printf("\n=== Network I/O (5 second window) ===\n");
print(@send_bytes);
print(@recv_bytes);
clear(@send_bytes);
clear(@recv_bytes);
}
'
Part E: Measure System Call Overhead
# Count syscalls per second, per type
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter
{
@syscalls[args->id] = count();
}
interval:s:1
{
printf("\n--- syscalls/sec ---\n");
print(@syscalls);
clear(@syscalls);
}
'
Analysis questions:
1. Your application calls write() 50,000 times/second for small 64-byte writes. What is the approximate CPU overhead just from the syscall interface? (Each syscall costs ~1µs, so 50ms/s = 5% CPU just in syscall overhead)
2. How would you reduce this? (Batching, io_uring, sendmmsg)
3. What is the difference between kprobe and tracepoint probes? Why prefer tracepoints?
Lab Appendix: Debugging Tool Quick Reference
| Scenario | Tool | Key Command |
|---|---|---|
| Kernel crash post-mortem | crash utility |
crash vmlinux /proc/vmcore |
| NULL pointer / bad memory | KASAN | CONFIG_KASAN=y + read dmesg |
| Memory leak (kernel) | kmemleak | cat /sys/kernel/debug/kmemleak |
| Stack overflow detection | KASAN stack | CONFIG_KASAN_STACK=y |
| Deadlock detection | lockdep | CONFIG_PROVE_LOCKING=y |
| Lock contention statistics | lock_stat | /proc/lock_stat |
| Function call trace | ftrace | /sys/kernel/tracing/ |
| Latency measurement | ftrace irqsoff | echo irqsoff > current_tracer |
| Dynamic tracing | bpftrace | bpftrace -e 'kprobe:...' |
| Syscall tracing (development) | strace | strace -e trace=read,write ./app |
| Live kernel debugging | KGDB | GDB + target remote :1234 |
| Hardware performance counters | perf | perf stat ./app |