Skip to content

Section 24: Debugging — Overview

Section Purpose and Scope

This section covers systematic debugging of low-level software: kernel panics, crash dump analysis, memory corruption, race conditions, and undefined behavior. It spans user-space and kernel-space debugging, covering the full toolchain from GDB and KGDB through dynamic tracing (ftrace, eBPF, dtrace) to sanitizers and lock debuggers. The goal is not a tutorial on any single tool, but a map of when and why each technique is applicable — building the judgment to choose the right approach when a production system exhibits pathological behavior.


Prerequisites

  • Section 03: Kernel Fundamentals (system calls, kernel data structures)
  • Section 04: Kernel Architecture (memory layout, interrupt handling)
  • Section 06: CPU Architecture (registers, calling conventions, instruction set)
  • Section 07: Process Management (signals, /proc, ptrace)
  • Section 11: Memory Management (virtual address space layout, stack, heap)

Learning Objectives

  1. Analyze a kernel panic message and identify the faulting instruction, call stack, and likely cause.
  2. Set up kdump/kexec and extract a vmcore; navigate it with crash(1).
  3. Use GDB with vmlinux to decode kernel data structures from a core dump.
  4. Write ftrace function_graph traces to understand unexpected kernel call paths.
  5. Write a kprobe or uprobe to dynamically instrument any kernel or user function without recompilation.
  6. Use eBPF (bpftrace / BCC) to answer precise questions about system behavior.
  7. Configure and interpret ASAN, TSAN, UBSAN, KASAN, and KMSAN reports.
  8. Interpret lockdep warnings and trace them to the actual deadlock path.

Architecture Overview

  Debugging Tool Taxonomy by Target and Technique:

  ┌───────────────────────────────────────────────────────────────────┐
  │                      User Space                                   │
  │                                                                   │
  │  Static Analysis        Dynamic (live)      Post-mortem           │
  │  ┌───────────────┐     ┌──────────────┐    ┌──────────────────┐  │
  │  │ clang-tidy    │     │ GDB / LLDB   │    │ gcore + GDB      │  │
  │  │ Coverity      │     │ strace       │    │ Valgrind memcheck│  │
  │  │ CodeChecker   │     │ ltrace       │    │ coredumpctl       │  │
  │  │ scan-build    │     │ perf record  │    │ ASAN reports      │  │
  │  └───────────────┘     │ dtrace       │    └──────────────────┘  │
  │                        │ bpftrace     │                           │
  │  Sanitizers            │ stap         │    Valgrind Suite         │
  │  ┌───────────────┐     └──────────────┘    ┌──────────────────┐  │
  │  │ ASAN          │                         │ memcheck          │  │
  │  │ TSAN          │                         │ helgrind (races)  │  │
  │  │ UBSAN         │                         │ cachegrind        │  │
  │  │ MSAN          │                         │ callgrind         │  │
  │  └───────────────┘                         └──────────────────┘  │
  └───────────────────────────────────────────────────────────────────┘

  ┌───────────────────────────────────────────────────────────────────┐
  │                      Kernel Space                                 │
  │                                                                   │
  │  Static Analysis        Dynamic (live)      Post-mortem           │
  │  ┌───────────────┐     ┌──────────────┐    ┌──────────────────┐  │
  │  │ sparse        │     │ KGDB         │    │ crash(1) on      │  │
  │  │ smatch        │     │ ftrace        │    │  vmcore          │  │
  │  │ Coccinelle    │     │ kprobes       │    │ GDB + vmlinux    │  │
  │  └───────────────┘     │ uprobes       │    │ decode_stacktrace│  │
  │                        │ tracepoints   │    └──────────────────┘  │
  │  Kernel Sanitizers     │ eBPF/bpftrace │                          │
  │  ┌───────────────┐     │ perf sched    │    kdump/kexec           │
  │  │ KASAN         │     └──────────────┘    ┌──────────────────┐  │
  │  │ KMSAN         │                         │ kexec loads crash│  │
  │  │ KCSAN         │     Lock Debugging       │ kernel on panic  │  │
  │  │ UBSAN(kernel) │     ┌──────────────┐    │ makedumpfile     │  │
  │  └───────────────┘     │ lockdep       │    │ compressed vmcore│  │
  │                        │ lock_stat     │    └──────────────────┘  │
  │                        └──────────────┘                           │
  └───────────────────────────────────────────────────────────────────┘

  Kernel Panic Anatomy:
  ┌─────────────────────────────────────────────────────────────────┐
  │  [   123.456789] BUG: kernel NULL pointer dereference           │
  │  [   123.456790]  at 0000000000000010                           │
  │  [   123.456791] IP: some_function+0x42/0xa0                   │
  │  [   123.456792] RIP: 0010:some_function+0x42/0xa0             │
  │  [   123.456793] RSP: 0018:ffffc9000d3c7e80  EFLAGS: 00010286  │
  │  ...                                                            │
  │  [   123.456800] Call Trace:                                    │
  │  [   123.456801]  <TASK>                                        │
  │  [   123.456802]  caller_a+0x18/0x40                           │
  │  [   123.456803]  caller_b+0x5a/0x100                          │
  │  [   123.456804]  do_syscall_64+0x5b/0x1a0                     │
  │  [   123.456805]  entry_SYSCALL_64_after_hwframe+0x6e/0xd3     │
  │                                                                 │
  │  Parse: IP line → addr2line/faddr2line → source line           │
  │         Call Trace → offset/size encoded in each frame         │
  └─────────────────────────────────────────────────────────────────┘

Key Concepts

  • Kernel Panic: Fatal, unrecoverable kernel error. Common causes: NULL pointer dereference, stack overflow, BUG()/BUG_ON() assertion failures, hardware errors. Output includes register state, call trace, and memory dump.
  • kdump/kexec: kexec loads a crash kernel into reserved memory. On panic, the primary kernel jumps to the crash kernel, which captures a vmcore (memory image) to disk. crash(1) utility then analyzes the vmcore offline.
  • crash(1): Interactive crash dump analysis tool. Commands: bt (backtrace), ps (process list), vm (virtual memory map), files (open files), net (network state), mod (loaded modules). Requires matching vmlinux with debug symbols.
  • KGDB: Kernel GDB stub. Allows remote GDB session to a running kernel via serial or network. Requires CONFIG_KGDB and kgdboc parameter. Useful for development kernels.
  • ftrace: In-kernel tracing framework. Tracers: function, function_graph, irqsoff, preemptoff, wakeup. Interface via /sys/kernel/tracing/. Dynamic function patching via mcount/fentry trampolines (no recompile needed).
  • kprobes: Dynamic instrumentation mechanism. Attach handler to any kernel instruction address by replacing with breakpoint (int3 on x86). Safe, production-capable. Basis for many BCC/bpftrace one-liners.
  • uprobes: User-space equivalent of kprobes. Attach to user-space function addresses. Enables tracing of language runtimes (Python, JVM, Go) without modifying the binary.
  • Tracepoints: Static instrumentation hooks in kernel source. More efficient than kprobes (no breakpoint; patched at boot). Stable ABI. Used extensively in scheduler, block I/O, networking.
  • bpftrace: High-level eBPF front-end language. One-liner tracing with DTrace-like syntax. Built on kprobes, uprobes, tracepoints, perf_events. Ideal for interactive investigation.
  • ASAN (AddressSanitizer): Compiler instrumentation detecting heap buffer overflows, stack buffer overflows, use-after-free, use-after-return. 2x slowdown typical. Shadow memory maps each byte.
  • TSAN (ThreadSanitizer): Detects data races using happens-before algorithm. 5-15x slowdown. Tracks memory accesses and synchronization operations per thread.
  • UBSAN (UndefinedBehaviorSanitizer): Detects C/C++ undefined behavior: signed overflow, null pointer, misaligned access, division by zero.
  • KASAN (Kernel AddressSanitizer): Kernel equivalent of ASAN. Detects heap/stack out-of-bounds and use-after-free in kernel code. Requires CONFIG_KASAN. Used in syzbot fuzzing infrastructure.
  • KMSAN (Kernel MemorySanitizer): Detects use of uninitialized memory in kernel. Required CONFIG_KMSAN (merged kernel 5.20/6.1). Catches info leaks and logic errors.
  • lockdep: Runtime lock dependency checker. Tracks lock acquisition order and detects potential deadlock cycles before they occur. Reports "possible circular locking" with full chain.
  • strace/ltrace: System call and library call tracing via ptrace. Essential for diagnosing unexpected system call behavior, file access patterns, and IPC. Low-level but always available.
  • Valgrind: Dynamic binary instrumentation framework. memcheck detects memory errors (uninitialized reads, heap overflows, leaks). 10-50x slowdown. No recompile needed.
  • Coccinelle: Semantic patch tool for C. Finds and transforms patterns across large codebases. Used in Linux kernel for finding common bug patterns.

Major Historical Milestones

Year Event
1988 GDB 1.0 released
1990s strace (ptrace-based syscall tracer) first available on Linux
2000 Valgrind 1.0 — dynamic binary instrumentation for memory debugging
2004 DTrace released on Solaris — production-safe dynamic tracing
2006 kprobes merged into mainline Linux kernel (2.6.9 earlier, production-stable by 2.6)
2008 ftrace merged into Linux 2.6.27
2010 perf tools integrated into Linux kernel tree
2012 AddressSanitizer (ASAN) open-sourced by Google/LLVM
2012 ThreadSanitizer v2 (TSAN) released
2013 KASAN work begins; merged mainline Linux 4.0 (2015)
2014 eBPF extended to general programs (not just socket filters) in Linux 3.18
2015 BCC (BPF Compiler Collection) released by Brendan Gregg et al.
2016 syzbot (syzkaller) begins automated kernel fuzzing with KASAN
2018 bpftrace 0.1 released
2019 uprobes + BPF-based USDT tracing mature
2021 KMSAN merged into Linux 5.17 (2022 release cycle)
2022 KCSAN (Kernel Concurrency Sanitizer) reaches production quality

Modern Relevance

Debugging low-level systems remains a core engineering skill despite improved tooling. Production incidents increasingly require kernel-level diagnosis: performance regressions from kernel scheduler changes, security vulnerabilities in kernel drivers, memory leaks in long-running services, and subtle race conditions that only manifest under production load. eBPF has transformed production debugging — it is now practical to attach custom probes to production kernels with microsecond latency without restarts or performance overhead.

The sanitizer ecosystem (ASAN, TSAN, KASAN, KMSAN) combined with fuzzing (syzkaller, libFuzzer) forms the backbone of continuous kernel security testing. Understanding these tools is necessary for contributing to kernel development or maintaining C/C++ codebases with high reliability requirements.


File Map

24-debugging/
├── 00-overview.md                  ← this file
├── 01-kernel-panic-analysis.md     ← reading oops/panic output, symbols, addr2line
├── 02-kdump-kexec.md               ← kdump setup, vmcore capture, makedumpfile
├── 03-crash-tool.md                ← crash commands, kernel data structure navigation
├── 04-coredump-analysis.md         ← gcore, coredumpctl, GDB user-space analysis
├── 05-gdb-kernel.md                ← vmlinux, GDB scripts, kernel module debugging
├── 06-kgdb.md                      ← KGDB setup, remote debug session
├── 07-dtrace.md                    ← DTrace on Solaris/macOS/BSD, D language
├── 08-strace-ltrace.md             ← ptrace internals, strace output analysis
├── 09-perf.md                      ← perf stat, perf record, perf report, annotations
├── 10-ftrace.md                    ← function_graph, tracers, filter syntax
├── 11-kprobes-uprobes.md           ← probe mechanics, registration, safety
├── 12-tracepoints.md               ← TRACE_EVENT macros, static instrumentation
├── 13-ebpf-debugging.md            ← bpftrace one-liners, BCC tools, CO-RE
├── 14-asan-tsan-ubsan.md           ← sanitizer internals, shadow memory, race detection
├── 15-kasan-kmsan.md               ← kernel sanitizers, syzbot integration
├── 16-lockdep.md                   ← dependency graph, cycle detection, annotations
├── 17-race-detection.md            ← KCSAN, TSAN, Helgrind — race finding techniques
├── 18-memory-debugging.md          ← Valgrind suite, DHAT, leak detection
└── 19-static-analysis.md           ← Coverity, clang-tidy, sparse, smatch, Coccinelle

Cross-References

  • Section 03 (Kernel Fundamentals): System call interface, kernel data structures debugged with crash/GDB
  • Section 06 (CPU Architecture): Register conventions, calling conventions needed for stack trace interpretation
  • Section 07 (Process Management): ptrace — the mechanism underlying strace, GDB, sanitizer interceptors
  • Section 11 (Memory Management): Virtual address space layout — essential for interpreting crash addresses
  • Section 23 (Observability): eBPF also used for observability; tracepoints serve both debugging and monitoring
  • Section 25 (Performance Engineering): perf, ftrace, eBPF tools appear in both debugging and profiling
  • Section 26 (Security): Sanitizers and fuzz testing for security vulnerability discovery
  • Section 27 (Kernel Exploits): Understanding bugs found by KASAN/KMSAN/fuzzing that lead to exploits