Section 24: Debugging — Overview
Section Purpose and Scope
This section covers systematic debugging of low-level software: kernel panics, crash dump analysis, memory corruption, race conditions, and undefined behavior. It spans user-space and kernel-space debugging, covering the full toolchain from GDB and KGDB through dynamic tracing (ftrace, eBPF, dtrace) to sanitizers and lock debuggers. The goal is not a tutorial on any single tool, but a map of when and why each technique is applicable — building the judgment to choose the right approach when a production system exhibits pathological behavior.
Prerequisites
- Section 03: Kernel Fundamentals (system calls, kernel data structures)
- Section 04: Kernel Architecture (memory layout, interrupt handling)
- Section 06: CPU Architecture (registers, calling conventions, instruction set)
- Section 07: Process Management (signals, /proc, ptrace)
- Section 11: Memory Management (virtual address space layout, stack, heap)
Learning Objectives
- Analyze a kernel panic message and identify the faulting instruction, call stack, and likely cause.
- Set up kdump/kexec and extract a vmcore; navigate it with crash(1).
- Use GDB with vmlinux to decode kernel data structures from a core dump.
- Write ftrace function_graph traces to understand unexpected kernel call paths.
- Write a kprobe or uprobe to dynamically instrument any kernel or user function without recompilation.
- Use eBPF (bpftrace / BCC) to answer precise questions about system behavior.
- Configure and interpret ASAN, TSAN, UBSAN, KASAN, and KMSAN reports.
- Interpret lockdep warnings and trace them to the actual deadlock path.
Architecture Overview
Debugging Tool Taxonomy by Target and Technique:
┌───────────────────────────────────────────────────────────────────┐
│ User Space │
│ │
│ Static Analysis Dynamic (live) Post-mortem │
│ ┌───────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ clang-tidy │ │ GDB / LLDB │ │ gcore + GDB │ │
│ │ Coverity │ │ strace │ │ Valgrind memcheck│ │
│ │ CodeChecker │ │ ltrace │ │ coredumpctl │ │
│ │ scan-build │ │ perf record │ │ ASAN reports │ │
│ └───────────────┘ │ dtrace │ └──────────────────┘ │
│ │ bpftrace │ │
│ Sanitizers │ stap │ Valgrind Suite │
│ ┌───────────────┐ └──────────────┘ ┌──────────────────┐ │
│ │ ASAN │ │ memcheck │ │
│ │ TSAN │ │ helgrind (races) │ │
│ │ UBSAN │ │ cachegrind │ │
│ │ MSAN │ │ callgrind │ │
│ └───────────────┘ └──────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────┐
│ Kernel Space │
│ │
│ Static Analysis Dynamic (live) Post-mortem │
│ ┌───────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ sparse │ │ KGDB │ │ crash(1) on │ │
│ │ smatch │ │ ftrace │ │ vmcore │ │
│ │ Coccinelle │ │ kprobes │ │ GDB + vmlinux │ │
│ └───────────────┘ │ uprobes │ │ decode_stacktrace│ │
│ │ tracepoints │ └──────────────────┘ │
│ Kernel Sanitizers │ eBPF/bpftrace │ │
│ ┌───────────────┐ │ perf sched │ kdump/kexec │
│ │ KASAN │ └──────────────┘ ┌──────────────────┐ │
│ │ KMSAN │ │ kexec loads crash│ │
│ │ KCSAN │ Lock Debugging │ kernel on panic │ │
│ │ UBSAN(kernel) │ ┌──────────────┐ │ makedumpfile │ │
│ └───────────────┘ │ lockdep │ │ compressed vmcore│ │
│ │ lock_stat │ └──────────────────┘ │
│ └──────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Kernel Panic Anatomy:
┌─────────────────────────────────────────────────────────────────┐
│ [ 123.456789] BUG: kernel NULL pointer dereference │
│ [ 123.456790] at 0000000000000010 │
│ [ 123.456791] IP: some_function+0x42/0xa0 │
│ [ 123.456792] RIP: 0010:some_function+0x42/0xa0 │
│ [ 123.456793] RSP: 0018:ffffc9000d3c7e80 EFLAGS: 00010286 │
│ ... │
│ [ 123.456800] Call Trace: │
│ [ 123.456801] <TASK> │
│ [ 123.456802] caller_a+0x18/0x40 │
│ [ 123.456803] caller_b+0x5a/0x100 │
│ [ 123.456804] do_syscall_64+0x5b/0x1a0 │
│ [ 123.456805] entry_SYSCALL_64_after_hwframe+0x6e/0xd3 │
│ │
│ Parse: IP line → addr2line/faddr2line → source line │
│ Call Trace → offset/size encoded in each frame │
└─────────────────────────────────────────────────────────────────┘
Key Concepts
- Kernel Panic: Fatal, unrecoverable kernel error. Common causes: NULL pointer dereference, stack overflow, BUG()/BUG_ON() assertion failures, hardware errors. Output includes register state, call trace, and memory dump.
- kdump/kexec:
kexecloads a crash kernel into reserved memory. On panic, the primary kernel jumps to the crash kernel, which captures avmcore(memory image) to disk.crash(1)utility then analyzes the vmcore offline. - crash(1): Interactive crash dump analysis tool. Commands:
bt(backtrace),ps(process list),vm(virtual memory map),files(open files),net(network state),mod(loaded modules). Requires matching vmlinux with debug symbols. - KGDB: Kernel GDB stub. Allows remote GDB session to a running kernel via serial or network. Requires
CONFIG_KGDBandkgdbocparameter. Useful for development kernels. - ftrace: In-kernel tracing framework. Tracers:
function,function_graph,irqsoff,preemptoff,wakeup. Interface via/sys/kernel/tracing/. Dynamic function patching via mcount/fentry trampolines (no recompile needed). - kprobes: Dynamic instrumentation mechanism. Attach handler to any kernel instruction address by replacing with breakpoint (int3 on x86). Safe, production-capable. Basis for many BCC/bpftrace one-liners.
- uprobes: User-space equivalent of kprobes. Attach to user-space function addresses. Enables tracing of language runtimes (Python, JVM, Go) without modifying the binary.
- Tracepoints: Static instrumentation hooks in kernel source. More efficient than kprobes (no breakpoint; patched at boot). Stable ABI. Used extensively in scheduler, block I/O, networking.
- bpftrace: High-level eBPF front-end language. One-liner tracing with DTrace-like syntax. Built on kprobes, uprobes, tracepoints, perf_events. Ideal for interactive investigation.
- ASAN (AddressSanitizer): Compiler instrumentation detecting heap buffer overflows, stack buffer overflows, use-after-free, use-after-return. 2x slowdown typical. Shadow memory maps each byte.
- TSAN (ThreadSanitizer): Detects data races using happens-before algorithm. 5-15x slowdown. Tracks memory accesses and synchronization operations per thread.
- UBSAN (UndefinedBehaviorSanitizer): Detects C/C++ undefined behavior: signed overflow, null pointer, misaligned access, division by zero.
- KASAN (Kernel AddressSanitizer): Kernel equivalent of ASAN. Detects heap/stack out-of-bounds and use-after-free in kernel code. Requires
CONFIG_KASAN. Used in syzbot fuzzing infrastructure. - KMSAN (Kernel MemorySanitizer): Detects use of uninitialized memory in kernel. Required
CONFIG_KMSAN(merged kernel 5.20/6.1). Catches info leaks and logic errors. - lockdep: Runtime lock dependency checker. Tracks lock acquisition order and detects potential deadlock cycles before they occur. Reports "possible circular locking" with full chain.
- strace/ltrace: System call and library call tracing via ptrace. Essential for diagnosing unexpected system call behavior, file access patterns, and IPC. Low-level but always available.
- Valgrind: Dynamic binary instrumentation framework. memcheck detects memory errors (uninitialized reads, heap overflows, leaks). 10-50x slowdown. No recompile needed.
- Coccinelle: Semantic patch tool for C. Finds and transforms patterns across large codebases. Used in Linux kernel for finding common bug patterns.
Major Historical Milestones
| Year | Event |
|---|---|
| 1988 | GDB 1.0 released |
| 1990s | strace (ptrace-based syscall tracer) first available on Linux |
| 2000 | Valgrind 1.0 — dynamic binary instrumentation for memory debugging |
| 2004 | DTrace released on Solaris — production-safe dynamic tracing |
| 2006 | kprobes merged into mainline Linux kernel (2.6.9 earlier, production-stable by 2.6) |
| 2008 | ftrace merged into Linux 2.6.27 |
| 2010 | perf tools integrated into Linux kernel tree |
| 2012 | AddressSanitizer (ASAN) open-sourced by Google/LLVM |
| 2012 | ThreadSanitizer v2 (TSAN) released |
| 2013 | KASAN work begins; merged mainline Linux 4.0 (2015) |
| 2014 | eBPF extended to general programs (not just socket filters) in Linux 3.18 |
| 2015 | BCC (BPF Compiler Collection) released by Brendan Gregg et al. |
| 2016 | syzbot (syzkaller) begins automated kernel fuzzing with KASAN |
| 2018 | bpftrace 0.1 released |
| 2019 | uprobes + BPF-based USDT tracing mature |
| 2021 | KMSAN merged into Linux 5.17 (2022 release cycle) |
| 2022 | KCSAN (Kernel Concurrency Sanitizer) reaches production quality |
Modern Relevance
Debugging low-level systems remains a core engineering skill despite improved tooling. Production incidents increasingly require kernel-level diagnosis: performance regressions from kernel scheduler changes, security vulnerabilities in kernel drivers, memory leaks in long-running services, and subtle race conditions that only manifest under production load. eBPF has transformed production debugging — it is now practical to attach custom probes to production kernels with microsecond latency without restarts or performance overhead.
The sanitizer ecosystem (ASAN, TSAN, KASAN, KMSAN) combined with fuzzing (syzkaller, libFuzzer) forms the backbone of continuous kernel security testing. Understanding these tools is necessary for contributing to kernel development or maintaining C/C++ codebases with high reliability requirements.
File Map
24-debugging/
├── 00-overview.md ← this file
├── 01-kernel-panic-analysis.md ← reading oops/panic output, symbols, addr2line
├── 02-kdump-kexec.md ← kdump setup, vmcore capture, makedumpfile
├── 03-crash-tool.md ← crash commands, kernel data structure navigation
├── 04-coredump-analysis.md ← gcore, coredumpctl, GDB user-space analysis
├── 05-gdb-kernel.md ← vmlinux, GDB scripts, kernel module debugging
├── 06-kgdb.md ← KGDB setup, remote debug session
├── 07-dtrace.md ← DTrace on Solaris/macOS/BSD, D language
├── 08-strace-ltrace.md ← ptrace internals, strace output analysis
├── 09-perf.md ← perf stat, perf record, perf report, annotations
├── 10-ftrace.md ← function_graph, tracers, filter syntax
├── 11-kprobes-uprobes.md ← probe mechanics, registration, safety
├── 12-tracepoints.md ← TRACE_EVENT macros, static instrumentation
├── 13-ebpf-debugging.md ← bpftrace one-liners, BCC tools, CO-RE
├── 14-asan-tsan-ubsan.md ← sanitizer internals, shadow memory, race detection
├── 15-kasan-kmsan.md ← kernel sanitizers, syzbot integration
├── 16-lockdep.md ← dependency graph, cycle detection, annotations
├── 17-race-detection.md ← KCSAN, TSAN, Helgrind — race finding techniques
├── 18-memory-debugging.md ← Valgrind suite, DHAT, leak detection
└── 19-static-analysis.md ← Coverity, clang-tidy, sparse, smatch, Coccinelle
Cross-References
- Section 03 (Kernel Fundamentals): System call interface, kernel data structures debugged with crash/GDB
- Section 06 (CPU Architecture): Register conventions, calling conventions needed for stack trace interpretation
- Section 07 (Process Management): ptrace — the mechanism underlying strace, GDB, sanitizer interceptors
- Section 11 (Memory Management): Virtual address space layout — essential for interpreting crash addresses
- Section 23 (Observability): eBPF also used for observability; tracepoints serve both debugging and monitoring
- Section 25 (Performance Engineering): perf, ftrace, eBPF tools appear in both debugging and profiling
- Section 26 (Security): Sanitizers and fuzz testing for security vulnerability discovery
- Section 27 (Kernel Exploits): Understanding bugs found by KASAN/KMSAN/fuzzing that lead to exploits