Skip to content

Kernel Engineering Learning Roadmap

A complete month-by-month guide to becoming a production kernel engineer. This roadmap assumes you have solid C programming skills and basic operating systems familiarity. Each phase has explicit success criteria so you know when you are ready to advance.


Overview

Phase Months Focus Outcome
Foundations 1–3 OS theory, C toolchain, xv6 Can build and modify a teaching OS
Kernel Basics 4–6 Linux internals, first module, process source Can write and load a kernel module
Intermediate 7–12 Memory, drivers, debugging Can write a character device driver and debug crashes
Advanced 13–24 Scheduler, networking, filesystems, upstream contribution Have at least one upstream patch accepted
Expert 25–36 Subsystem depth, mailing list, architecture decisions Recognized subsystem contributor

Phase 1: Foundations (Months 1–3)

Core Texts

Book ISBN Coverage Priority
Operating Systems: Three Easy Pieces (OSTEP) — Arpaci-Dusseau Free online / 978-1985086593 Virtualization, Concurrency, Persistence Essential
Computer Systems: A Programmer's Perspective (CS:APP) — Bryant & O'Hallaron 978-0134092669 Memory hierarchy, assembly, linking, I/O Essential
The C Programming Language — Kernighan & Ritchie 978-0131103627 C fundamentals and standard library Reference

Month 1: OS Theory and C Toolchain

Topics: - OSTEP Parts I and II (Virtualization and Concurrency — 30 chapters) - CS:APP Chapters 1–3 (machine representation, assembly, arithmetic) - GCC flags: -Wall -Wextra -O2 -g -fsanitize=address - Make, GDB basics: breakpoints, watchpoints, backtraces - Valgrind: memcheck, helgrind

Lab Exercises: 1. Write a shell in C (fork/exec/wait, pipe, I/O redirection) 2. Write a thread-safe bounded queue using pthreads (mutex + condition variable) 3. Use GDB to debug a deliberately corrupted linked list 4. Observe virtual memory layout with /proc/self/maps

Success Criteria: - Can explain virtual address translation from first principles - Can write a multi-threaded producer-consumer with no data races (verified with helgrind) - GDB session is natural — no documentation needed for basic commands

Month 2: Build and Modify xv6

Resource: MIT xv6-riscv — https://github.com/mit-pdos/xv6-riscv (RISC-V version preferred; x86 version also fine for reference)

Topics: - xv6 boot sequence: entry.S → main.c - Process lifecycle: allocproc, fork, exec, exit, wait - Context switching: swtch.S, scheduler - File system: inode layer, logging, buffer cache - System call path: user space → ecall → syscall dispatch → implementation

Lab Exercises (MIT 6.S081 problem sets — all publicly available): 1. Lab 1: Unix utilities (implement sleep, pingpong, primes) 2. Lab 2: System calls (add trace, sysinfo syscalls) 3. Lab 3: Page tables (implement pgaccess, print page table) 4. Lab 4: Traps and backtraces 5. Lab 5: Copy-on-write fork

Success Criteria: - Can walk through xv6 boot in GDB line by line - Can add a new syscall to xv6 from scratch without documentation - Can explain COW fork implementation at the page-table level

Month 3: Linux Source Introduction and Build System

Key Linux Source Paths (kernel 6.x):

Path What lives here
arch/x86/ x86-specific boot, interrupt, paging code
kernel/ Core: scheduler, signals, timers, locking
mm/ Memory management: page allocator, slab, virtual memory
fs/ VFS layer and all filesystem implementations
drivers/ All device drivers
net/ Networking stack
include/linux/ Core kernel headers
Documentation/ Kernel documentation (always read this first)

Topics: - Build the kernel from source: make defconfig, make menuconfig, make -j$(nproc) - Boot custom kernel in QEMU: qemu-system-x86_64 -kernel bzImage -append "root=/dev/sda console=ttyS0" - Read Documentation/process/coding-style.rst and Documentation/process/submitting-patches.rst - Study kernel/printk/printk.c — understand how printk works end to end - Read include/linux/list.h — circular doubly-linked lists used everywhere

Lab Exercises: 1. Build kernel 6.x with debug symbols, boot in QEMU, attach GDB 2. Add a single pr_info() line to an existing kernel function, verify in dmesg 3. Read and annotate kernel/fork.c:copy_process() — trace every resource being duplicated

Success Criteria: - Custom-built kernel boots in QEMU within 15 minutes from a clean clone - Can find any kernel function definition using cscope or ctags without IDE assistance - Can explain the difference between kmalloc, vmalloc, and alloc_pages


Phase 2: Kernel Basics (Months 4–6)

Core Texts

Book ISBN Coverage
Linux Kernel Development (3rd ed.) — Robert Love 978-0672329463 Best single-volume intro to kernel internals
Understanding the Linux Kernel — Bovet & Cesati 978-0596005658 Deep reference, covers 2.6 but concepts hold

Month 4: Linux Kernel Development Book + First Module

Topics (Love, all chapters): - Process descriptor (task_struct), PID/TID, namespaces - Kernel stack (8 KB on x86_64), thread_info - Interrupt handling: top halves, bottom halves, softirqs, tasklets, workqueues - Kernel synchronization primitives: spinlock, mutex, RW semaphore, RCU - Timer subsystem: jiffies, schedule_timeout, hrtimer

First Kernel Module — Complete Skeleton:

// hello.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("First kernel module");

static int __init hello_init(void)
{
    pr_info("hello: module loaded\n");
    return 0;
}

static void __exit hello_exit(void)
{
    pr_info("hello: module unloaded\n");
}

module_init(hello_init);
module_exit(hello_exit);
# Makefile
obj-m += hello.o
KDIR := /lib/modules/$(shell uname -r)/build

all:
    make -C $(KDIR) M=$(PWD) modules

clean:
    make -C $(KDIR) M=$(PWD) clean

Commands: sudo insmod hello.ko && dmesg | tail -5 && sudo rmmod hello

Lab Exercises: 1. Write a module that creates a kthread and performs periodic work 2. Write a module that registers a timer callback and logs every 1 second 3. Write a module that demonstrates spinlock vs. mutex performance difference using ktime

Success Criteria: - Can compile and load a module that uses interrupt-safe spinlocks - Understands why you cannot sleep inside a spinlock-protected region - Can explain RCU read-side critical sections without locks

Month 5: Process Management Source Deep Dive

Key Source Files:

File What to Study
kernel/fork.c copy_process(), do_fork(), COW setup
kernel/exit.c do_exit(), zombie state, reparenting
kernel/exec.c do_execve(), binary format handlers
fs/exec.c ELF loading, load_elf_binary()
kernel/signal.c Signal delivery, do_signal()
arch/x86/kernel/process.c Context switch, __switch_to()

Topics: - Understand every field of task_struct (include/linux/sched.h) — annotate the struct - Trace a fork() call from userspace through glibc → syscall → do_forkcopy_process - Study namespace implementation: include/linux/nsproxy.h, kernel/nsproxy.c - Cgroups v2: kernel/cgroup/cgroup.c

Lab Exercises: 1. Write a module that walks the task list and prints all process names and PIDs 2. Write a module that intercepts task_newtask tracepoint and logs new process creation 3. Use ftrace to trace do_fork — count fork calls per second under load

Success Criteria: - Can draw the complete lifecycle of a Linux process (states, transitions, zombie handling) - Can explain how namespaces isolate PIDs, mount points, and network stacks

Month 6: System Calls and VFS

Key Source Files:

File What to Study
fs/read_write.c read(), write() syscall implementations
fs/namei.c Path resolution, filename_lookup()
fs/inode.c Inode lifecycle, inode cache
include/linux/fs.h file_operations, inode_operations, super_operations

Lab Exercises: 1. Implement a read-only proc filesystem entry that exposes a counter 2. Trace open() from userspace to VFS using ftrace function_graph 3. Measure syscall overhead: use perf stat on a tight getpid() loop

Success Criteria: - Can explain VFS layer: dentry, inode, file, superblock relationships - Can trace any syscall through kernel source without assistance


Phase 3: Intermediate (Months 7–12)

Core Texts

Book ISBN Coverage
Linux Device Drivers (3rd ed.) — Corbet, Rubini, Kroah-Hartmann Free online / 978-0596005900 Definitive driver reference (some APIs outdated — cross-check with kernel docs)
Professional Linux Kernel Architecture — Mauerer 978-0470343432 Comprehensive internals reference

Months 7–8: Memory Management Deep Dive

Key Source Files:

File What to Study
mm/page_alloc.c Buddy allocator, zone watermarks, page allocation path
mm/slub.c SLUB allocator (preferred slab implementation)
mm/vmalloc.c vmalloc — virtually contiguous, physically scattered
mm/mmap.c mmap() implementation, VMA creation
mm/memory.c Page fault handler, handle_mm_fault()
mm/swap.c Swap-out logic, page reclaim
mm/oom_kill.c OOM killer — how victim is selected
include/linux/mm_types.h vm_area_struct, mm_struct, page descriptor

Key Papers: - Gorman, M. "Understanding the Linux Virtual Memory Manager" (free PDF) — still the best detailed reference

Lab Exercises: 1. Write a module that allocates pages at each GFP zone and reports physical addresses 2. Use /proc/buddyinfo and /proc/slabinfo to visualize allocator state under memory pressure 3. Instrument handle_mm_fault() with kprobes — count major vs. minor faults per process 4. Write a userspace program that triggers OOM and observe oom_kill.c path in ftrace

Success Criteria: - Can explain buddy allocator fragmentation and compaction - Can read /proc/meminfo and explain every field - Can explain the difference between anonymous and file-backed mappings

Months 9–10: Device Driver Development

Lab Exercises (progressive complexity):

Lab Skills
Character device with read/write/ioctl cdev, file_operations, copy_to/from_user
Platform device driver (devicetree) platform_driver, of_match_table, probe/remove
PCI device driver pci_driver, BAR mapping, MSI interrupts
DMA buffer management dma_alloc_coherent, scatter-gather, IOMMU

Debugging with KGDB:

# Kernel boot params for KGDB over serial
console=ttyS0,115200 kgdboc=ttyS0,115200 kgdbwait

# GDB connection
gdb vmlinux
(gdb) target remote /dev/ttyS1
(gdb) lx-ps    # requires gdb-kernel-helpers

AddressSanitizer (KASAN) Configuration:

CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y

Success Criteria: - Have a working character device driver that passes ftest (basic file operation tests) - Can use KGDB to set a breakpoint inside a driver and inspect inode and file structures

Months 11–12: Debugging Mastery

Tools Reference:

Tool Best For Key Command
KGDB Interactive kernel debugging target remote :1234 with QEMU -s -S
KASAN Memory safety bugs Read report in dmesg
KFENCE Lightweight production sampling CONFIG_KFENCE=y
lockdep Lock ordering violations Enabled with CONFIG_PROVE_LOCKING=y
ftrace Function tracing, latency echo function > /sys/kernel/tracing/current_tracer
perf CPU performance, sampling perf record -ag -- sleep 10 && perf report
bpftrace Dynamic tracing bpftrace -e 'kprobe:do_sys_open { ... }'
kprobes Runtime instrumentation register_kprobe() in module

Success Criteria: - Have debugged at least three self-induced kernel panics using KGDB - Can read a kernel oops and identify the faulting instruction, register state, and call stack - KASAN report is fully interpretable without documentation


Phase 4: Advanced (Months 13–24)

Month 13–15: CFS Scheduler Source

Key Source Files:

File What to Study
kernel/sched/core.c Main scheduler, schedule(), pick_next_task()
kernel/sched/fair.c CFS implementation, vruntime, rb-tree, load balancing
kernel/sched/rt.c SCHED_FIFO and SCHED_RR
kernel/sched/deadline.c EDF/CBS for SCHED_DEADLINE
kernel/sched/topology.c NUMA topology, scheduling domains

Key Concepts to Implement in Experiments: - vruntime normalization (weight-based fairness) - Group scheduling and cgroup CPU bandwidth - Load balancing across NUMA nodes — observe with perf sched

Success Criteria: - Can explain why a high-priority task doesn't get 100% CPU under CFS - Can trace pick_next_task_fair() through the rb-tree

Months 16–18: Networking Stack Deep Dive

Key Source Files:

Path What lives there
net/core/sock.c Socket layer, sk_buff lifecycle
net/ipv4/tcp.c TCP state machine, tcp_rcv_established()
net/ipv4/ip_output.c IP fragmentation, routing table lookup
net/core/dev.c Network device layer, NAPI polling
net/packet/af_packet.c Raw sockets
drivers/net/ NIC drivers (e1000, virtio-net)

Lab Exercises: 1. Trace a send() call from socket layer to NIC driver using ftrace function_graph 2. Instrument TCP window scaling with kprobes — log every RTT change 3. Write a simple XDP program that drops ICMP packets

Months 19–21: Write a Simple Filesystem

Project: Implement a minimal FUSE filesystem in C - On-disk format: superblock, inode table, block bitmap, data blocks - Operations: getattr, readdir, read, write, create, mkdir, unlink - Persistence: serialize to a file (disk image) - Testing: ls, cp, cat, find, fsstress

Key References: - fs/ext2/ — reference implementation for on-disk format ideas - FUSE documentation: libfuse GitHub repository

Months 22–24: First Upstream Kernel Contribution

Recommended Entry Points:

Area Why Good for Beginners Subsystem Maintainer List
drivers/staging/ Lower bar, TODOs in code Greg KH
Documentation fixes Always welcome Jonathan Corbet
Typo/comment fixes Learn process overhead-free Any maintainer
checkpatch.pl warnings Mechanical but teaches style Subsystem-specific

Patch Submission Workflow:

git format-patch -1 HEAD          # generate patch
./scripts/checkpatch.pl *.patch   # lint
./scripts/get_maintainer.pl *.patch  # find recipients
git send-email --to= --cc= *.patch

Success Criteria: - At least one patch accepted to drivers/staging/ or documentation - Received substantive review feedback on at least three patches


Phase 5: Expert (Months 25–36)

Subsystem Deep Work

Pick one subsystem for sustained contribution. Options with maintainer stability:

Subsystem Entry Complexity Key Mailing List
Memory Management (MM) Very High linux-mm@kvack.org
Scheduler Very High linux-kernel@vger.kernel.org
Networking High netdev@vger.kernel.org
Filesystems High linux-fsdevel@vger.kernel.org
USB Medium linux-usb@vger.kernel.org
Block layer Medium linux-block@vger.kernel.org

Kernel Mailing List (LKML) Participation

Daily Workflow: 1. Subscribe to subsystem list (not full LKML — too high volume) 2. Read patch series in Lore: https://lore.kernel.org/ 3. Review patches before maintainer does — send Reviewed-by: tags 4. Attend Kernel Recipes or Linux Plumbers conference

Key Papers for Expert Phase

Paper Year Why Read Now
Molnar "CFS Scheduler Design Document" 2007 Primary CFS design doc
McKenney "What is RCU, Fundamentally?" LWN 2007 RCU from its author
Corbet "The SO_REUSEPORT socket option" LWN 2013 Deep networking API
Ts'o et al. "ext4 filesystem design" 2010 Production FS design decisions

Success Criteria for Expert Phase

  • Maintainer of at least one small driver or subsystem component
  • Have submitted patches that introduced new functionality (not just fixes)
  • Can review patches in your subsystem and give technically correct feedback
  • Invited to present at a kernel conference or co-author a LWN article

Appendix: Essential Tools Reference

Tool Install Primary Use
cscope apt install cscope Navigate kernel source cross-references
ctags apt install exuberant-ctags Jump-to-definition in editors
sparse apt install sparse Static analysis: make C=1
smatch Build from source Deep semantic analysis
coccinelle apt install coccinelle Semantic patches
pahole apt install dwarves Show struct sizes and padding
bpftrace See bpftrace.io Dynamic kernel tracing
trace-cmd apt install trace-cmd ftrace front-end

Appendix: Milestone Checklist

  • [ ] Phase 1 complete: xv6 compiles, boots, and you have added a syscall
  • [ ] Phase 2 complete: hello world module loads; task-list walker module works
  • [ ] Phase 3 complete: character device driver with ioctl; KGDB session with breakpoints inside your own driver
  • [ ] Phase 4 complete: upstream patch accepted; have read scheduler, networking, and MM source
  • [ ] Phase 5 complete: sustained subsystem contribution; Reviewed-by: tags sent regularly

The total investment is approximately 2–4 hours per weekday plus 6–8 hours on weekends, sustained over three years. The bottleneck is not reading speed — it is hands-on debugging time. Every concept must be confirmed in a running system before moving forward.