Section 03: Kernel Fundamentals — Overview
Section Purpose and Scope
This section is the bridge between the conceptual foundations of Section 00 and the architectural choices of Section 04. Where Section 00 explains why a kernel exists, this section explains what a kernel concretely is and how it operates at the implementation level.
The scope covers the kernel's roles and responsibilities, the precise mechanics of the kernel/user boundary, privilege levels on real hardware, the system call interface as actually implemented in Linux, the core kernel data structures that appear throughout all subsystems, how kernel memory is managed, how the kernel initializes from nothing at boot, and how kernels are versioned and configured. This section is heavy with Linux-specific detail because Linux is the reference implementation studied throughout this archive.
Prerequisites
- Section 00 (Foundations): user/kernel space, privilege rings, interrupts, system calls
- Section 01 (Computer History): hardware evolution context
- Section 02 (OS History): Unix/Linux lineage
Learning Objectives
After completing this section you will be able to:
- Describe with precision the five fundamental roles of a kernel
- Explain the kernel/user boundary at the assembly level on x86-64 and ARM64
- Read and understand the Linux system call table and trace a syscall end-to-end in kernel source
- Identify and explain the purpose of core kernel data structures:
task_struct,mm_struct,inode,file,socket - Explain how kernel memory is allocated (kmalloc, vmalloc, slab allocator, page allocator)
- Describe the Linux kernel initialization sequence from
start_kernel()to PID 1 - Interpret a Linux kernel version number and explain the configuration system (Kconfig/menuconfig)
Architecture Overview
KERNEL ROLES AND STRUCTURE
┌───────────────────────────────────────────────────────────┐
│ KERNEL SPACE │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ SYSTEM CALL INTERFACE │ │
│ │ read() write() open() mmap() clone() ioctl() ... │ │
│ └───────────────────────┬─────────────────────────────┘ │
│ │ │
│ ┌───────────┬───────────┼──────────────┬──────────────┐ │
│ │ PROCESS │ MEMORY │ FILESYSTEM │ NETWORK │ │
│ │ MANAGER │ MANAGER │ (VFS layer) │ STACK │ │
│ │ │ │ │ │ │
│ │task_struct│mm_struct │inode / dentry│sk_buff │ │
│ │run queues │page tables│superblock │net_device │ │
│ │scheduler │slab/buddy │file_operations│socket │ │
│ └─────┬─────┴─────┬─────┴──────┬───────┴──────┬───────┘ │
│ │ │ │ │ │
│ ┌─────▼───────────▼────────────▼───────────────▼───────┐ │
│ │ DEVICE DRIVER SUBSYSTEM │ │
│ │ Block drivers │ Char drivers │ Network drivers │ │
│ └───────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────┐ │
│ │ ARCHITECTURE-SPECIFIC CODE (arch/) │ │
│ │ x86-64 │ ARM64 │ RISC-V │ POWER │ s390 │ │
│ └───────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
SYSTEM CALL FLOW (x86-64):
User Space Kernel Space
────────── ────────────
libc wrapper (read)
│ sets rax=0 (syscall #)
│ sets args in rdi,rsi,rdx
│
▼ SYSCALL instruction
───────────────────────────► entry_SYSCALL_64()
│ saves registers
│ switches to kernel stack
▼
do_syscall_64()
│ looks up sys_call_table[rax]
▼
sys_read()
│ validates args
│ calls VFS read path
▼
SYSRET instruction
◄─────────────────────────── restores registers
│ returns to user space
▼
return value in rax
Key Concepts
- Kernel Role 1 — Process Management: Create, schedule, and terminate processes; maintain the process table; enforce isolation between processes.
- Kernel Role 2 — Memory Management: Allocate and free physical memory pages; manage virtual address spaces; handle page faults; enforce memory protection.
- Kernel Role 3 — File System Abstraction: Present a unified namespace for files, devices, pipes, and sockets via the Virtual File System (VFS) layer.
- Kernel Role 4 — Device Management: Provide a uniform API to hardware devices via driver abstractions; handle interrupt routing.
- Kernel Role 5 — Network Stack: Implement protocol stacks (TCP/IP, UDP, UNIX sockets) and route packets between processes and the network.
- task_struct: The Process Control Block (PCB) in Linux. Approximately 700 fields describing every aspect of a process/thread: state, PID, credentials, memory map, open files, signal handlers, scheduler state.
- mm_struct: Describes a process's virtual memory layout — VMA list, page table pointer, code/data/stack regions.
- Slab Allocator: A cache-oriented kernel memory allocator that maintains pools of fixed-size objects to reduce allocation latency and fragmentation. Basis of SLUB (the default Linux allocator).
- Buddy Allocator: The underlying page-level allocator. Manages free memory in power-of-2 sized blocks; merges adjacent ("buddy") free blocks to reduce fragmentation.
- Kconfig: The Linux kernel configuration system. Thousands of options control which subsystems, drivers, and features are compiled in.
make menuconfigprovides an interactive TUI. - Kernel Version Numbering:
major.minor.patch[-rcN][-distro-suffix]. Since 2011,minorincrements every ~10 weeks on a fixed release schedule regardless of feature count.
Core Kernel Data Structures
task_struct (process descriptor)
├── pid_t pid, tgid
├── struct mm_struct *mm ← virtual memory
├── struct fs_struct *fs ← filesystem context (CWD, root)
├── struct files_struct *files ← open file descriptors
├── struct signal_struct *signal ← signal handlers
├── struct cred *cred ← UID/GID/capabilities
├── struct sched_entity se ← CFS scheduler state
├── struct list_head tasks ← linked list of all tasks
└── ...~700 more fields
mm_struct (virtual memory descriptor)
├── struct vm_area_struct *mmap ← list of VMAs
├── pgd_t *pgd ← page global directory (top-level page table)
├── unsigned long start_code, end_code
├── unsigned long start_stack
└── atomic_t mm_users, mm_count
inode (file metadata)
├── unsigned long i_ino ← inode number
├── umode_t i_mode ← permissions + type
├── const struct inode_operations *i_op
├── struct super_block *i_sb ← filesystem this inode belongs to
└── struct address_space i_data ← page cache
Major Historical Milestones
| Year | Milestone |
|---|---|
| 1969 | Unix kernel: ~10,000 lines of assembly/C — the minimal viable kernel |
| 1972 | Unix rewritten in C; kernel portability becomes possible |
| 1979 | Unix V7: virtual memory added; first general-purpose paging kernel |
| 1987 | Mach 2.0: message-passing microkernel; influence on XNU and L4 |
| 1991 | Linux 0.01: 10,239 lines of C; single architecture (i386) |
| 1996 | Linux 2.0: SMP support; multiple architectures |
| 2000 | Linux 2.4: loadable modules mature; new VM (Rik van Riel) |
| 2003 | Linux 2.6: O(1) scheduler, NPTL, 4K stacks, preemptible kernel |
| 2007 | Linux 2.6.23: CFS (Completely Fair Scheduler) replaces O(1) |
| 2011 | Linux 3.0: renumbering; KVM, cgroups, namespaces mature |
| 2014 | Linux 3.15: SLUB becomes default; perf improvements |
| 2019 | Linux 5.1: io_uring merged — async kernel I/O interface |
| 2021 | Linux 5.14: core scheduling for SMT security |
| 2022 | Linux 6.1: Rust support added to the kernel build system |
| 2024 | Linux 6.8+: Rust drivers merged into mainline tree |
Modern Relevance and Production Use Cases
Kernel debugging and observability: Understanding task_struct and the kernel's data structures is a prerequisite for using perf, bpftrace, crash dump analysis, and /proc//sys inspection. Every production incident on Linux eventually involves reading kernel state.
Security and hardening: Kernel exploits (Section 27) almost universally target kernel data structures — overwriting cred fields to escalate privileges, corrupting task_struct to escape namespaces, or abusing file operations to achieve arbitrary kernel write.
Container internals: Docker, Kubernetes pods, and OCI containers are implemented via kernel namespaces and cgroups, both represented as fields and structures within task_struct and related objects.
Kernel development: The Kconfig system, coding style (Documentation/process/coding-style.rst), and data structure conventions are the entry point for kernel contribution. Linux receives ~10,000 patches per release cycle.
Performance tuning: Knowing that the slab allocator and buddy allocator underlie every kernel allocation explains why /proc/slabinfo and slabtop are useful tools, and why huge pages and THP exist.
File Map
03-kernel-fundamentals/
├── 00-overview.md ← This file
├── 01-kernel-definition.md ← Precise definition, minimal kernel properties
├── 02-kernel-roles.md ← Five roles with concrete examples
├── 03-kernel-user-boundary.md ← Privilege transition mechanics, assembly walkthrough
├── 04-privilege-levels.md ← x86 rings, ARM EL0-EL3, RISC-V modes in detail
├── 05-system-call-interface.md ← syscall table, dispatch, VDSO, audit
├── 06-task-struct.md ← PCB anatomy, field-by-field tour
├── 07-kernel-memory.md ← Buddy allocator, SLUB, vmalloc, kmalloc
├── 08-kernel-initialization.md ← start_kernel() to PID 1, full sequence
├── 09-kernel-versioning.md ← Version scheme, LTS policy, distribution patches
├── 10-kernel-configuration.md ← Kconfig, menuconfig, defconfig, modular builds
Cross-References
- Section 00 (Foundations): The conceptual basis for everything in this section
- Section 04 (Kernel Architecture): How different kernels organize these same roles differently
- Section 05 (Boot Process):
kernel-initialization.mdis the bridge to the full boot sequence - Section 07 (Process Management):
task_structexpanded into full process lifecycle - Section 11 (Memory Management): The allocators described here in full depth
- Section 26 (Security): How kernel data structures are hardened
Recommended Depth of Study
Essential: Files 01–05. Every systems engineer should understand kernel roles and the system call path precisely.
Deep dive recommended: Files 06–07 for kernel developers, performance engineers, and security researchers. Files 08–10 for embedded developers and kernel builders.
Hands-on: Build a kernel from source (make menuconfig, make -j$(nproc), boot in QEMU). Read the output of /proc/slabinfo and /proc/kallsyms on a running system.
Estimated study time: 15–20 hours, including source code reading.