What Is a Kernel?

Technical Overview

The kernel is the central component of an operating system: a privileged software layer that sits directly above the hardware and provides a controlled, portable interface through which all other software accesses machine resources. It is not the operating system itself — the OS is the kernel plus the user-space programs, libraries, and services that make a system usable — but it is the irreplaceable core around which everything else is built.

At its most fundamental level, the kernel does two things: it manages hardware resources (CPU time, memory, I/O devices) and it provides abstractions of those resources to programs running in user space. A process does not talk directly to a physical memory chip; it talks to the kernel's virtual memory subsystem, which maps logical addresses to physical pages and enforces isolation. A process does not write bytes directly to a disk platter; it calls write(2), and the kernel's VFS layer routes that call through a filesystem driver to a block device driver to the actual storage hardware.

Prerequisites

Basic understanding of what a CPU, RAM, and storage are
Familiarity with the concept of a program and a process
Awareness that software runs at different privilege levels
Some exposure to the C programming language (kernel code is primarily C)

Core Content

Kernel Responsibilities

The Linux kernel (used throughout this archive as the reference implementation) organizes its responsibilities into several major subsystems:

Process Management The kernel creates, schedules, and destroys processes. It maintains a struct task_struct for every process and kernel thread — on a busy production server, this list can contain thousands of entries. The scheduler (kernel/sched/core.c) decides which process runs on which CPU core at any given microsecond, balancing fairness, latency, and throughput.

Memory Management The MM subsystem (mm/) manages physical memory (page frames tracked in struct page), virtual address spaces (described by struct mm_struct and a tree of struct vm_area_struct), page tables, the page cache, the swap subsystem, and memory allocators (the buddy allocator for pages, SLAB/SLUB/SLOB for kernel objects, and kmalloc for arbitrary kernel allocations).

Device Management Drivers (drivers/) allow the kernel to speak the protocol of each piece of hardware. The kernel provides a unified driver model (drivers/base/) so that a USB storage device, a PCI NIC, and an I2C temperature sensor are all registered, enumerated, and power-managed through the same infrastructure.

Filesystem Support The Virtual Filesystem Switch (VFS, fs/) provides a single set of system calls (open, read, write, stat, mmap) regardless of whether the underlying storage is ext4, XFS, Btrfs, tmpfs, NFS, or a FUSE-mounted object store. Each real filesystem registers its own struct file_operations and struct inode_operations implementing those abstract operations.

Networking The networking stack (net/) implements the protocol suite from Ethernet frames up through TCP/IP to the socket API exposed to applications. It is one of the most complex subsystems, containing tens of thousands of lines implementing TCP congestion control alone.

Security The Linux Security Module (LSM) framework (security/) allows security policies to be enforced at kernel hook points. SELinux, AppArmor, and seccomp all plug in here. The kernel itself enforces the fundamental access controls: a process can only access files it has permission to access, can only signal processes in its session, and cannot read another process's memory.

The Kernel vs. the Operating System

This distinction matters because it affects how you think about problems:

Layer	Examples
Kernel	Linux, Windows NT kernel, XNU (macOS), FreeBSD kernel
System libraries	glibc, musl, ntdll.dll, libSystem.dylib
System daemons	systemd, launchd, svchost.exe
Shell	bash, zsh, cmd.exe, PowerShell
Applications	Firefox, PostgreSQL, OpenSSH

"Linux" in common usage refers to the entire GNU/Linux ecosystem. Strictly, "Linux" is only the kernel. The distinction matters when you debug a production issue: a glibc bug is not a kernel bug, even though glibc sits between your application and the kernel.

The Kernel as Resource Manager and Abstraction Layer

These two roles are intertwined but conceptually separate:

As a resource manager, the kernel is an accountant. It tracks who owns which physical pages, which CPU cycles are allocated to which process, which file descriptors are open. It enforces limits (cgroups, ulimits, capability checks). It arbitrates contention (the scheduler, the block I/O elevator, network queueing disciplines).

As an abstraction layer, the kernel is a translator. It makes every disk look like a stream of bytes, every network card look like a socket, every CPU look like a sequential instruction machine regardless of how many cores or NUMA nodes exist. This is what enables the same PostgreSQL binary to run on a Raspberry Pi and a 256-core AMD EPYC server.

ASCII Layered Diagram

+----------------------------------------------------------+
|                     APPLICATIONS                         |
|        (Firefox, PostgreSQL, sshd, bash, ...)            |
+----------------------------------------------------------+
                            |
                    System Calls (open, read,
                    write, mmap, socket, ...)
                            |
+----------------------------------------------------------+
|                  SYSTEM LIBRARIES                        |
|    (glibc / musl: wraps syscalls, provides C runtime)    |
+----------------------------------------------------------+
                            |
              Traps into kernel via SYSCALL /
              SYSENTER instruction
                            |
+----------------------------------------------------------+
|                       KERNEL                             |
|  +----------------+  +----------+  +------------------+ |
|  | Process/Sched  |  |  Memory  |  | VFS / Filesystem | |
|  +----------------+  +----------+  +------------------+ |
|  +----------------+  +----------+  +------------------+ |
|  |   Networking   |  | Security |  | Device Drivers   | |
|  +----------------+  +----------+  +------------------+ |
+----------------------------------------------------------+
                            |
              Architecture-specific code (arch/)
              reads/writes hardware registers
                            |
+----------------------------------------------------------+
|                      HARDWARE                            |
|   CPU cores    Physical RAM    NIC    Disk    USB ...     |
+----------------------------------------------------------+

Monolithic vs. Microkernel: A Preview

Linux is a monolithic kernel: all the subsystems above live in a single address space running in ring 0 (the most privileged CPU mode). A function call from the scheduler to the memory allocator is just a function call — no overhead.

In a microkernel (Mach, L4, seL4, QNX), only the absolute minimum runs in ring 0: address space management, inter-process communication, and thread scheduling. File systems, device drivers, and networking run as user-space servers. The overhead of IPC between components is higher, but a crashing driver cannot corrupt the kernel.

In practice, most production operating systems are hybrid. macOS uses XNU, which is a Mach microkernel with a large BSD kernel component merged in. Windows NT has a microkernel-style object manager but with most executive services in ring 0 for performance.

Kernel Size and Complexity Evolution

The growth of the Linux kernel is a concrete measure of how hardware complexity has outpaced simplicity:

Year	Version	Lines of Code	Notable additions
1991	0.01	~10,000	x86 only, no networking, no modules
1994	1.0	~176,000	Networking, module support
1999	2.2	~1.8M	SMP support, many new drivers
2003	2.6	~5.9M	NPTL threads, device model rewrite
2011	3.0	~14.6M	Btrfs, KVM, cgroups
2015	4.0	~19.5M	Live patching, eBPF
2020	5.10	~27.8M	io_uring, BPF CO-RE
2023	6.6	~32M+	EEVDF scheduler, Rust infrastructure

This growth is driven overwhelmingly by drivers. The core kernel (scheduler, MM, VFS, networking) has grown proportionally much less. Every new GPU generation, every new WiFi chipset, every new storage protocol adds tens of thousands of lines.

The Kernel/Userspace Contract

The most important guarantee Linux makes: the kernel ABI toward user space is stable and never broken. This is Linus Torvalds' famous rule, stated explicitly in Documentation/process/stable-api-nonsense.rst: the system call interface, the behavior of /proc, the format of /sys, and the behavior of signals — none of these change in ways that would break existing binaries.

This is why a statically linked x86-64 binary compiled in 2003 still runs on a 6.6 kernel. The kernel/user contract is the /usr/include/asm boundary.

The kernel makes no such promise to kernel modules or between internal subsystems. Internal APIs change between versions. This is why out-of-tree drivers break on every kernel update.

Historical Context

The concept of a privileged operating nucleus dates to the late 1950s. The IBM 709 and later the Compatible Time-Sharing System (CTSS, 1961) at MIT demonstrated that a central supervisor could multiplex hardware among users. The Multics project (1964–1969) formalized the idea of hierarchical rings of protection and an integrated file system. Unix (1969, Bell Labs) distilled Multics' ideas into something small enough to rewrite in C in 1972 — the first portable kernel.

The term "kernel" became standard with BSD Unix in the late 1970s. Dijkstra's THE system (1968) and Brinch Hansen's work on the RC 4000 (1969) established the microkernel concept. The monolithic vs. microkernel debate erupted publicly in 1992 in the Usenet comp.os.minix thread between Andrew Tanenbaum and Linus Torvalds — a debate whose practical resolution is the hybrid architectures used in production today.

Production Examples

Google's use of the Linux kernel: Google runs a modified Linux kernel across its entire fleet. Their kernel team maintains patches for features like cgroup v2 improvements, TCP modifications (BBR congestion control, which they contributed upstream), and custom scheduler tuning. The kernel is a first-class production concern.

Android kernel fragmentation: Android uses a Linux kernel but OEMs maintain their own device-specific forks. A kernel running a Samsung Galaxy S23 may be based on Linux 5.15 LTS with hundreds of Samsung-specific patches. The Generic Kernel Image (GKI) project at Google attempts to standardize the kernel core while moving OEM code into loadable modules — a practical application of the kernel/driver boundary.

AWS Nitro: Amazon's Nitro hypervisor moves device emulation (networking, EBS) out of the hypervisor process and into dedicated hardware controllers. From the guest kernel's perspective, it talks to standard virtio devices. The host-side is a custom Linux-based system. The kernel abstraction enables this hardware/software boundary to be moved.

Debugging Notes

When a kernel bug manifests, the primary artifact is a kernel panic message. Learning to read one is essential:

BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:some_driver_function+0x48/0x120

RIP (instruction pointer) tells you exactly where in kernel code the fault occurred.
The function name + offset allows addr2line or gdb vmlinux to find the source line.
CONFIG_KALLSYMS=y is required for symbol resolution in production kernels.

Key debugging tools: dmesg, /proc/kmsg, ftrace (/sys/kernel/debug/tracing/), perf, kprobes, eBPF.

Security Implications

The kernel is the ultimate trust boundary. A process running in user space is constrained. Code running in ring 0 (kernel space) is not. Therefore:

Every kernel vulnerability is a potential full system compromise.
Privilege escalation attacks (CVE-2016-5195 "Dirty COW", CVE-2022-0847 "Dirty Pipe") exploit kernel bugs to gain ring 0 execution or write to files the attacker shouldn't be able to write.
Defense layers: SMEP (prevents kernel from executing user-space pages), SMAP (prevents kernel from accessing user-space data without explicit stac/clac), KASLR (randomizes kernel load address), CFI (Control Flow Integrity), and seccomp (limits which syscalls a process may invoke).

Kernel attack surface is proportional to kernel size. This is a fundamental argument for microkernels in high-security environments (seL4 has a formally verified 9,000-line kernel).

Performance Implications

The kernel is in the critical path of almost every I/O operation. Performance-critical systems spend significant effort reducing kernel involvement:

DPDK (Data Plane Development Kit): moves NIC polling from the kernel into user space, eliminating interrupt overhead and the kernel networking stack for packet processing. Used by telcos and cloud providers for line-rate packet forwarding.
io_uring (Linux 5.1+, io_uring_setup(2)): submits I/O operations in batches via a shared ring buffer, dramatically reducing syscall overhead for high-IOPS workloads.
vDSO (virtual Dynamic Shared Object): maps certain kernel data (current time, etc.) into user space so that clock_gettime(CLOCK_REALTIME) doesn't require a syscall trap at all.
CPU time in kernel: perf stat reports task-clock split between user and sys. A well-tuned database server should spend less than 5% of CPU time in kernel mode during sequential reads.

Failure Modes and Real Incidents

The 2010 Linux Kernel OOM Killer Incident (various): Under memory pressure, the kernel's Out-Of-Memory killer selects and kills a process. Misconfigured overcommit settings (/proc/sys/vm/overcommit_memory) can cause the OOM killer to fire unexpectedly, killing critical daemons. This has caused database process terminations at scale.

CrowdStrike / Windows BSOD (July 2024): A faulty update to the CrowdStrike Falcon sensor, a ring-0 kernel driver on Windows, caused an invalid memory access during boot, triggering a BSOD (kernel panic). 8.5 million machines were rendered unbootable. This is a direct consequence of driver code running in ring 0: one bad pointer dereference crashes the entire system.

Dirty COW (CVE-2016-5195): A race condition in the kernel's copy-on-write memory subsystem allowed an unprivileged user to write to read-only memory-mapped files, including /etc/passwd. Exploited in the wild before a patch was available. Affected every Linux kernel from 2.6.22 through 4.8.2.

Modern Usage

Kernel development in 2024 is dominated by:

eBPF: programs that run in a kernel-verified virtual machine, allowing safe kernel extension without writing kernel modules. Used for observability (Cilium, Falco), networking (XDP), and security (seccomp-BPF).
Rust in the kernel: Linux 6.1 merged the first Rust infrastructure (rust/). Rust modules can be written for subsystems like device drivers, reducing the class of memory safety bugs endemic to C.
io_uring: reshaping how high-performance I/O is written, enabling fully async, batched I/O with minimal syscall overhead.
CXL (Compute Express Link): new hardware for memory pooling between CPUs and accelerators is forcing kernel memory management to evolve significantly.

Future Directions

Rust-first kernel components: long-term, safety-critical subsystems may be rewritten in Rust. The rust-for-linux project has upstream support. First real drivers (NVMe, network) are being submitted.
eBPF as a kernel extension mechanism: BPF programs can now implement entire schedulers (sched_ext, merged in 6.11), TCP congestion algorithms, and filesystem operations. The kernel may evolve toward a smaller, more stable core with policy implemented in BPF.
Confidential computing: Intel TDX and AMD SEV-SNP require new kernel abstractions for encrypted memory and attestation, with significant MM subsystem changes.
Exokernel revival: DPDK, SPDK, and RDMA applications are effectively building exokernel-style systems where applications manage hardware directly. This trend will continue.

Exercises

Run uname -r on a Linux machine and look up the source of that kernel version at https://elixir.bootlin.com. Navigate to init/main.c and find start_kernel(). List the first 10 function calls made in start_kernel() and briefly describe what each does.
Use strace -c ls /tmp to count the system calls made by ls. Which syscall is called most frequently? What does that tell you about what ls does?
Read /proc/meminfo and /proc/slabinfo on a running Linux system. Identify which kernel slab cache is consuming the most memory. Research what objects that cache holds.
Find the definitions of struct task_struct and struct mm_struct in the Linux kernel source (include/linux/sched.h and include/linux/mm_types.h). Count the number of fields in each. What does the size of these structures tell you about kernel complexity?
Write a minimal C program that makes a raw system call using the syscall(2) wrapper (e.g., syscall(SYS_getpid)). Compile it and run it under strace. Confirm the syscall number used matches the kernel's arch/x86/entry/syscalls/syscall_64.tbl.

References

Linus Torvalds, comp.os.minix post announcing Linux, August 25, 1991
Andrew Tanenbaum, Modern Operating Systems, 4th ed., Pearson, 2014
Robert Love, Linux Kernel Development, 3rd ed., Addison-Wesley, 2010
Linux kernel source: init/main.c, include/linux/sched.h, mm/, fs/, net/, security/
Kernel documentation: Documentation/process/stable-api-nonsense.rst
Linus Torvalds on ABI stability: https://lkml.org/lkml/2012/12/23/75
LWN.net — the authoritative source for Linux kernel development coverage: https://lwn.net
Linux kernel cross-reference: https://elixir.bootlin.com