Kernel Debugging Labs

Five hands-on labs covering kernel panic analysis, memory safety bugs, deadlock detection, function tracing, and eBPF-based dynamic analysis. Each lab creates a deliberate bug, then walks through diagnosing it with the correct tool.

Environment Requirements: - QEMU x86_64 VM with a debug kernel build (see Lab 1 setup) - Host tools: GDB 9+, QEMU 6+, bpftrace 0.14+ - Kernel config: CONFIG_DEBUG_INFO=y, CONFIG_FRAME_POINTER=y, CONFIG_KALLSYMS=y

Lab 1: Reproduce and Analyze a Kernel Panic with QEMU + GDB

Objective: Set up a kernel debugging environment, trigger a NULL pointer dereference, read the oops report, and decode the stack trace back to source.

Part A: Build a Debug Kernel

# Clone kernel
git clone --depth=1 https://github.com/torvalds/linux.git
cd linux

# Start from a minimal config
make x86_64_defconfig

# Add debug options
cat >> .config << 'EOF'
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_FRAME_POINTER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_GDB_SCRIPTS=y
CONFIG_MAGIC_SYSRQ=y
EOF

make olddefconfig
make -j$(nproc) 2>&1 | tee build.log

Part B: Create a Kernel That Panics

# Create a minimal initramfs with a busybox shell
# (simplest approach: use buildroot or download a pre-built rootfs)

# Boot with GDB stub enabled (-s = listen on :1234, -S = freeze at start)
qemu-system-x86_64 \
    -kernel arch/x86/boot/bzImage \
    -initrd rootfs.cpio.gz \
    -append "console=ttyS0,115200 nokaslr kgdboc=ttyS0 kgdbwait" \
    -nographic \
    -s -S \
    -m 512M

Part C: Trigger a NULL Dereference

Write a module that deliberately dereferences NULL:

// null_deref.c
#include <linux/module.h>
#include <linux/kernel.h>

MODULE_LICENSE("GPL");

static int __init null_init(void)
{
    int *ptr = NULL;
    pr_info("null_deref: about to dereference NULL...\n");
    *ptr = 42;   /* This triggers a kernel oops / panic */
    return 0;
}

static void __exit null_exit(void) {}

module_init(null_init);
module_exit(null_exit);

sudo insmod null_deref.ko

Part D: Read the Oops

A kernel oops looks like this (abbreviated):

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
CPU: 0 PID: 1234 Comm: insmod Tainted: G  O  5.15.0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:null_init+0x1a/0x30 [null_deref]
Code: ...
RSP: 0018:ffffc9000024be98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88800f77c1e8 RDI: ffff88800f77c1e8
RBP: ffffc9000024bea8  R8: 0000000000000000  R9: 0000000000000000
R10: ffff88800f77c1e8 R11: 0000000000000000 R12: ffff88800f5a9600
R13: ffffc9000024bf58 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f8b2c5b3740(0000) GS:ffff88803bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000e10c000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 do_one_initcall+0x46/0x1c0
 do_init_module+0x4e/0x220
 load_module+0x2c73/0x3030
 __do_sys_finit_module+0xa5/0x110
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Part E: Decode the Oops

Reading the oops — field by field:

Field	Meaning
`BUG: kernel NULL pointer dereference`	Type of fault — NULL deref
`address: 0x0000000000000000`	The faulting virtual address — NULL
`Oops: 0002 [#1]`	Error code: bit 1 = write, not read; [#1] = first oops
`RIP: 0010:null_init+0x1a/0x30`	Faulting instruction: in `null_init`, offset 0x1a from start, function is 0x30 bytes
`CR2`	Page fault address — always the bad pointer for null deref
`Call Trace`	Stack frames from crash to bottom

Decode with GDB:

# In host terminal
gdb vmlinux
(gdb) target remote :1234
(gdb) list *(null_init+0x1a)    # show source line
(gdb) disassemble null_init      # see assembly, find offset 0x1a

With addr2line:

# Get the module .ko file with debug symbols
addr2line -e null_deref.ko 0x1a
# Output: null_deref.c:12 — the exact line

Analysis Questions

What is the difference between an oops and a panic? (Hint: panic_on_oops kernel parameter)
Why does RIP: null_init+0x1a not show a source file directly — what tool bridges that gap?
What does Tainted: G O mean in the oops header? (G = GPL module loaded, O = out-of-tree module)

Lab 2: Use KASAN to Detect Use-After-Free

Objective: Enable KASAN, write a module with a use-after-free bug, interpret the KASAN report.

Part A: Kernel Configuration

# Add to .config
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y
# Rebuild kernel

KASAN adds shadow memory (1/8 of addressable memory) that tracks every allocation. Accessing freed memory triggers an immediate, precise report.

Part B: Vulnerable Module

// uaf.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>

MODULE_LICENSE("GPL");

struct secret {
    int value;
    char name[64];
};

static int __init uaf_init(void)
{
    struct secret *s;

    s = kmalloc(sizeof(*s), GFP_KERNEL);
    if (!s)
        return -ENOMEM;

    s->value = 42;
    strncpy(s->name, "kernel secret", sizeof(s->name));
    pr_info("uaf: allocated at %px, value=%d\n", s, s->value);

    kfree(s);  /* s is freed here */

    /* UAF: read from freed memory */
    pr_info("uaf: reading freed memory: value=%d\n", s->value);

    /* UAF: write to freed memory */
    s->value = 99;   /* KASAN will catch this write */
    pr_info("uaf: wrote to freed memory\n");

    return 0;
}

static void __exit uaf_exit(void) {}
module_init(uaf_init);
module_exit(uaf_exit);

Part C: KASAN Report

When sudo insmod uaf.ko is run, dmesg shows:

==================================================================
BUG: KASAN: use-after-free in uaf_init+0x7a/0x90 [uaf]
Write of size 4 at addr ffff888011b7e000 by task insmod/1234

CPU: 0 PID: 1234 Comm: insmod Tainted: G  O  5.15.0+
Hardware name: QEMU Standard PC

Call trace:
 uaf_init+0x7a/0x90 [uaf]
 do_one_initcall+0x46/0x1c0
 ...

Allocated by task 1234:
 uaf_init+0x2c/0x90 [uaf]          ← allocation site
 ...

Freed by task 1234:
 uaf_init+0x58/0x90 [uaf]          ← free site
 ...

The buggy address ffff888011b7e000 belongs to the object at ffff888011b7e000
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of 128-byte region [ffff888011b7e000, ffff888011b7e080)
==================================================================

Part D: Interpret the Report

Section	What it tells you
`Write of size 4 at addr ...`	Exact address and size of the bad access
`Call trace`	Stack at time of bug — points to `uaf_init+0x7a`
`Allocated by task`	Where the memory was originally allocated
`Freed by task`	Where the memory was freed (the kfree)
`kmalloc-128 of size 128`	Which slab cache owned this memory

With GDB:

gdb vmlinux
(gdb) list *(uaf_init+0x7a)    # pinpoint the write line

Analysis Questions

Why does KASAN catch this immediately rather than letting it silently corrupt memory?
What is the shadow memory ratio KASAN uses, and what does it cost in memory overhead?
What is the difference between KASAN, KFENCE, and KMSAN? (Hint: overhead vs. coverage)

Lab 3: lockdep Deadlock Detection

Objective: Write a module with an ABBA deadlock, observe lockdep's dependency graph output, and understand the lock ordering rules.

Part A: Kernel Configuration

CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_LOCK_STAT=y

Lockdep instruments every lock acquisition/release and builds a dependency graph. If acquiring lock B while holding lock A, and elsewhere A is acquired while holding B, lockdep reports this immediately — before a real deadlock occurs.

Part B: Deadlock Module

// deadlock.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/spinlock.h>
#include <linux/kthread.h>
#include <linux/delay.h>

MODULE_LICENSE("GPL");

static DEFINE_SPINLOCK(lock_a);
static DEFINE_SPINLOCK(lock_b);

/* Thread 1: acquires A then B */
static int thread1_fn(void *data)
{
    pr_info("deadlock: thread1 acquiring A...\n");
    spin_lock(&lock_a);
    mdelay(10);   /* give thread2 time to grab lock_b */
    pr_info("deadlock: thread1 acquiring B...\n");
    spin_lock(&lock_b);    /* DEADLOCK: thread2 holds B, wants A */
    spin_unlock(&lock_b);
    spin_unlock(&lock_a);
    return 0;
}

/* Thread 2: acquires B then A (opposite order — ABBA deadlock) */
static int thread2_fn(void *data)
{
    pr_info("deadlock: thread2 acquiring B...\n");
    spin_lock(&lock_b);
    mdelay(10);
    pr_info("deadlock: thread2 acquiring A...\n");
    spin_lock(&lock_a);    /* DEADLOCK: thread1 holds A, wants B */
    spin_unlock(&lock_a);
    spin_unlock(&lock_b);
    return 0;
}

static struct task_struct *t1, *t2;

static int __init deadlock_init(void)
{
    pr_info("deadlock: starting threads\n");
    t1 = kthread_run(thread1_fn, NULL, "deadlock_t1");
    t2 = kthread_run(thread2_fn, NULL, "deadlock_t2");
    return 0;
}

static void __exit deadlock_exit(void)
{
    pr_info("deadlock: module exit\n");
}

module_init(deadlock_init);
module_exit(deadlock_exit);

Part C: lockdep Output

======================================================
WARNING: possible circular locking dependency detected
5.15.0 #1 Tainted: G  O
------------------------------------------------------
deadlock_t2/1236 is trying to acquire lock:
ffff888012345678 (lock_a){....}-{2:2}, at: thread2_fn+0x4e/0x70 [deadlock]

but task is already holding lock:
ffff88801234abcd (lock_b){....}-{2:2}, at: thread2_fn+0x2c/0x70 [deadlock]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (lock_b){....}-{2:2}:
       thread1_fn+0x4c/0x70 [deadlock]           ← thread1 grabbed B while holding A
       ...

-> #0 (lock_a){....}-{2:2}:
       thread1_fn+0x2a/0x70 [deadlock]
       ...

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(lock_a);
                               lock(lock_b);
                               lock(lock_a);   ← would deadlock here
  lock(lock_b);                                ← would deadlock here

 *** DEADLOCK ***

Part D: Understand the Dependency Graph

Concept	Meaning
Lock class	All instances of the same lock (same `lockdep_map`) are one "class"
Lock dependency	"A → B" means "A was held when B was acquired"
Cycle in graph	Potential deadlock — lockdep reports immediately
`{....}-{2:2}`	Lock type annotation: spinlock, 2 contexts

Fix: Establish a global lock ordering. Always acquire lock_a before lock_b everywhere.

Analysis Questions

lockdep found this without a real deadlock occurring. How? What data structure does it maintain?
What does {....} mean in the lockdep annotation? Read Documentation/locking/lockdep-design.rst
How would you fix this deadlock? What design principle does the fix represent?

Lab 4: ftrace Function Graph Tracing

Objective: Use ftrace to trace kernel function calls, measure function latency, and use the irqsoff latency tracer.

Part A: Basic Function Tracing

# Mount debugfs if not mounted
mount -t debugfs none /sys/kernel/debug

cd /sys/kernel/tracing

# Trace a specific function
echo do_sys_open > set_ftrace_filter
echo function > current_tracer
echo 1 > tracing_on
cat /proc/version   # trigger some open() calls
echo 0 > tracing_on
cat trace | head -40

Expected output:

# tracer: function
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             cat-1234  [000] ....   123.456789: do_sys_open <-__x64_sys_openat

Part B: Function Graph Tracer

echo function_graph > current_tracer
echo do_sys_open > set_graph_function
echo 1 > tracing_on
cat /proc/cpuinfo
echo 0 > tracing_on
cat trace | head -60

Expected output (call tree with timing):

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  do_sys_open() {
 0)               |    getname() {
 0)   1.234 us    |      kmem_cache_alloc();
 0)   0.456 us    |      strncpy_from_user();
 0)   3.891 us    |    }
 0)               |    get_unused_fd_flags() {
 0)   0.234 us    |      __alloc_fd();
 0)   1.560 us    |    }
 0)  23.456 us    |  }

Part C: Latency Tracers

# irqsoff: measure maximum IRQ-disabled latency
echo irqsoff > current_tracer
echo 1 > tracing_on
# Run a workload for 10 seconds
sleep 10
echo 0 > tracing_on
cat trace   # shows the worst-case IRQ-off window

What to look for in irqsoff output: - Maximum latency shown at the top: # irqsoff latency trace v1.1.5 on 5.15.0 - The exact function where IRQs were disabled and re-enabled - Any surprise: "why is function X disabling IRQs for 200µs?"

# preemptoff: measure maximum preemption-disabled latency
echo preemptoff > current_tracer

Part D: Trace a Custom Module Function

# Trace only your module's function
echo 'chardev_write' > set_ftrace_filter
echo function > current_tracer
echo 1 > tracing_on
echo "test" > /dev/chardev   # from Lab 2
echo 0 > tracing_on
cat trace

Analysis Questions

What is the difference between function and function_graph tracers? Which has higher overhead?
The irqsoff tracer showed a 50µs IRQ-disabled window in spin_lock_irqsave. What does that imply for a real-time system?
Use available_tracers to list all tracer types. What does blk trace?

Lab 5: bpftrace One-Liners and Scripts

Objective: Write bpftrace programs to trace system calls, measure latency distributions, and analyze off-CPU behavior.

Prerequisites

sudo apt install bpftrace linux-headers-$(uname -r)
# Verify
sudo bpftrace -e 'BEGIN { printf("bpftrace works\n"); }'

Part A: Trace execve() Calls

Goal: Print every process exec with its full argument string.

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_execve
{
    printf("%-6d %-16s %s\n", pid, comm, str(args->filename));
}
'

Expected output (while running commands in another terminal):

1234   bash             /usr/bin/ls
1235   bash             /usr/bin/cat
1236   bash             /bin/ps

Analysis questions: - What is the overhead of this tracepoint? Use perf stat to measure before/after. - How would you filter to only show execve from processes owned by UID 1000?

Part B: Syscall Latency Histogram

Goal: Measure the distribution of read() latency for a target process.

# First, find the PID of interest (e.g., nginx)
PID=$(pgrep nginx | head -1)

sudo bpftrace -e "
tracepoint:syscalls:sys_enter_read
/pid == $PID/
{
    @start[tid] = nsecs;
}

tracepoint:syscalls:sys_exit_read
/pid == $PID && @start[tid]/
{
    @lat_us = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}
" &

# Generate load
wrk -t 4 -c 50 -d 30s http://localhost/

# After 30 seconds, kill bpftrace (Ctrl+C) — it prints the histogram

Expected output:

@lat_us:
[0]                  124 |@@@@@@@@                                            |
[1]                 1482 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)               891 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[4, 8)               234 |@@@@@@@@                                            |
[8, 16)               67 |@@                                                  |
[16, 32)              12 |                                                    |
[32, 64)               3 |                                                    |

What to look for: - Bimodal distribution? Suggests cache hit vs. miss - Long tail (>1ms for reads)? Suggests I/O wait or lock contention - Most reads complete in [1, 2) µs? Normal for cached filesystem reads

Part C: Off-CPU Analysis

Goal: Find what kernel stacks a process is blocking on (waiting for, not burning CPU).

# offcputime shows time spent off-CPU, stacked by kernel call path
sudo /usr/share/bcc/tools/offcputime -p $(pgrep myserver) -K 30

# Or with bpftrace (simplified version)
sudo bpftrace -e '
software:cpu-clock:100
/pid == target_pid/
{
    @stacks[kstack, ustack] = count();
}

END { print(@stacks); }
' -DTARGET_PID=$(pgrep myserver)

Interpreting the output: - Stack dominated by futex_wait → mutex/condition variable contention - Stack dominated by ep_poll / epoll_wait → normal I/O wait (not a problem) - Stack dominated by schedule → preempted, potentially too many threads - Stack in blkdev_issue_flush → disk I/O wait

Part D: Advanced Script — Per-Process Network Bytes

sudo bpftrace -e '
kprobe:tcp_sendmsg
{
    @send_bytes[pid, comm] = sum(arg2);
}

kprobe:tcp_recvmsg
{
    @recv_bytes[pid, comm] = sum(arg2);
}

interval:s:5
{
    printf("\n=== Network I/O (5 second window) ===\n");
    print(@send_bytes);
    print(@recv_bytes);
    clear(@send_bytes);
    clear(@recv_bytes);
}
'

Part E: Measure System Call Overhead

# Count syscalls per second, per type
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter
{
    @syscalls[args->id] = count();
}

interval:s:1
{
    printf("\n--- syscalls/sec ---\n");
    print(@syscalls);
    clear(@syscalls);
}
'

Analysis questions: 1. Your application calls write() 50,000 times/second for small 64-byte writes. What is the approximate CPU overhead just from the syscall interface? (Each syscall costs ~1µs, so 50ms/s = 5% CPU just in syscall overhead) 2. How would you reduce this? (Batching, io_uring, sendmmsg) 3. What is the difference between kprobe and tracepoint probes? Why prefer tracepoints?

Lab Appendix: Debugging Tool Quick Reference

Scenario	Tool	Key Command
Kernel crash post-mortem	`crash` utility	`crash vmlinux /proc/vmcore`
NULL pointer / bad memory	KASAN	`CONFIG_KASAN=y` + read dmesg
Memory leak (kernel)	kmemleak	`cat /sys/kernel/debug/kmemleak`
Stack overflow detection	KASAN stack	`CONFIG_KASAN_STACK=y`
Deadlock detection	lockdep	`CONFIG_PROVE_LOCKING=y`
Lock contention statistics	lock_stat	`/proc/lock_stat`
Function call trace	ftrace	`/sys/kernel/tracing/`
Latency measurement	ftrace irqsoff	`echo irqsoff > current_tracer`
Dynamic tracing	bpftrace	`bpftrace -e 'kprobe:...'`
Syscall tracing (development)	strace	`strace -e trace=read,write ./app`
Live kernel debugging	KGDB	GDB + `target remote :1234`
Hardware performance counters	perf	`perf stat ./app`