Systems Knowledge Archive
Vision Statement
This archive is a comprehensive, deeply cross-referenced technical knowledge base covering the full vertical stack of modern computing systems — from transistor physics and CPU microarchitecture at the bottom, through operating system internals, to distributed systems, cloud infrastructure, and AI accelerator pipelines at the top. It is built on the conviction that a practitioner who understands the entire stack — not just their slice of it — writes software that is faster, safer, more reliable, and easier to reason about under failure.
The archive is not a tutorial series. It is a reference corpus: organized for deep reading, structured for dependency-aware navigation, and written to be technically precise. Every claim is grounded in how real systems actually work. Where theory diverges from production reality, production reality is documented.
How to Use This Archive
Reading Modes
Linear reading — follow the numbered sections 00 through 50. This is the most complete path and is recommended for anyone starting without a strong background. Sections are numbered to respect a natural dependency order: later sections assume knowledge from earlier ones.
Audience-targeted reading — use the quickstart paths below to skip directly to what matters for your role. Each path selects a subset of sections, ordered for that audience's needs.
Cross-reference reading — use TOPIC-HIERARCHY.md to find a specific concept, then follow internal cross-references ([see: section/file]) to related material. Use DEPENDENCY-GRAPH.md to understand what you need to know before reading a given topic.
Project-driven reading — sections 46-labs and 47-projects contain hands-on exercises. Each exercise lists the prerequisite sections. Start a project, get stuck, find the relevant section, go deep.
File Naming Convention
Within each section directory, files follow this naming pattern:
NN-section-name/
overview.md — section summary and reading guide
01-topic-name.md — individual deep-dive files
02-topic-name.md
...
exercises.md — hands-on exercises for this section
references.md — papers, books, code repositories
Cross-references between files use the format [see: 03-kernel-fundamentals/02-syscall-interface.md].
Directory Structure
All 51 sections, numbered 00–50:
| Section | Name | Description |
|---|---|---|
00 |
foundations | Boolean logic, binary arithmetic, data representation, abstraction layers |
01 |
computer-history | History from Babbage to Turing to UNIX to modern cloud |
02 |
operating-system-history | Evolution of OS design: batch → time-sharing → microkernel → exokernel → unikernel |
03 |
kernel-fundamentals | What a kernel is, system call interface, privilege levels, kernel vs. userspace |
04 |
kernel-architecture | Monolithic, microkernel, hybrid, exokernel architectures with real examples |
05 |
boot-process | BIOS/UEFI, bootloaders, early kernel init, initramfs, systemd |
06 |
cpu-architecture | ISAs, pipelines, superscalar execution, branch prediction, caches, NUMA |
07 |
process-management | Process creation, exec, fork, COW, process lifecycle, namespaces |
08 |
threading-models | Kernel threads, user threads, M:N threading, POSIX threads, goroutines |
09 |
scheduling | CFS, real-time schedulers, multicore scheduling, work stealing |
10 |
synchronization | Locks, spinlocks, mutexes, semaphores, RCU, lock-free data structures |
11 |
memory-management | Virtual memory, paging, page tables, TLB, page faults, slab allocator, OOM |
12 |
storage-systems | HDDs, SSDs, NVMe, RAID, storage I/O path, io_uring |
13 |
filesystems | VFS, ext4, XFS, btrfs, ZFS, tmpfs, overlayfs, FUSE |
14 |
device-drivers | Driver model, interrupts, DMA, character and block drivers |
15 |
networking | Network stack layers, socket API, packet lifecycle, kernel networking |
16 |
tcp-ip-internals | TCP state machine, congestion control, BBR, QUIC, kernel TCP stack |
17 |
distributed-systems | Consistency models, consensus (Raft/Paxos), distributed clocks, CRDTs |
18 |
database-internals | B-trees, LSM trees, WAL, MVCC, query execution, buffer pool management |
19 |
virtualization | Type 1/2 hypervisors, KVM, QEMU, hardware assist (VT-x, AMD-V), paravirt |
20 |
containers | cgroups, namespaces, overlayfs, OCI runtime, runc, containerd |
21 |
cloud-infrastructure | AWS/GCP/Azure internals, VPCs, object storage, managed services |
22 |
kubernetes-internals | etcd, API server, scheduler, kubelet, CNI, CSI, CRI |
23 |
observability | Metrics, logs, traces, eBPF-based observability, OpenTelemetry |
24 |
debugging | gdb, lldb, perf, strace, core dumps, kernel debugging, crash analysis |
25 |
performance-engineering | Profiling, flamegraphs, benchmarking methodology, latency analysis |
26 |
security | Threat modeling, exploit mitigations, ASLR, SMEP/SMAP, seccomp, AppArmor |
27 |
kernel-exploits | Privilege escalation, heap sprays, ROP chains, kernel CVE case studies |
28 |
reliability-engineering | SLOs, error budgets, chaos engineering, postmortem culture |
29 |
runtime-systems | GC algorithms, JVM internals, V8, CPython internals, memory safety runtimes |
30 |
compilers-and-linkers | Compiler pipeline, LLVM, linking, ELF format, dynamic linking, JIT |
31 |
gpu-systems | GPU architecture, CUDA, OpenCL, GPU memory hierarchy, PCIe bandwidth |
32 |
ai-infrastructure | Training clusters, parameter servers, NCCL, model serving, inference optimization |
33 |
hardware-architecture | CPUs (x86, ARM), FPGAs, ASICs, system buses, memory controllers |
34 |
embedded-systems | Bare-metal programming, RTOS, HAL, firmware, peripheral interfaces |
35 |
real-time-systems | Hard vs. soft RT, PREEMPT_RT, priority inversion, deadline scheduling |
36 |
mobile-operating-systems | Android/iOS architecture, Binder IPC, power management, app sandbox |
37 |
browser-and-sandbox-architecture | Browser process model, V8, Blink, sandboxing, site isolation |
38 |
system-design | Scalability patterns, CAP theorem tradeoffs, design interview patterns |
39 |
large-scale-case-studies | Google Bigtable, Amazon Dynamo, Facebook TAO, Kafka, Spanner, Cassandra |
40 |
failure-history | Famous outages: AWS US-EAST-1, Cloudflare BGP, Facebook October 2021, Therac-25 |
41 |
modern-kernel-challenges | eBPF, io_uring, rust-in-kernel, kernel security hardening, Landlock |
42 |
future-of-operating-systems | Unikernels, library OSes, capability-based security, hardware enclaves |
43 |
formal-verification | TLA+, Coq, seL4 proof, model checking, why proofs matter in real systems |
44 |
rust-and-memory-safety | Ownership model, borrow checker, unsafe Rust, Rust in Linux, memory safety in systems |
45 |
learning-roadmaps | Curated learning paths by role and experience level |
46 |
labs | Hands-on exercises with step-by-step instructions |
47 |
projects | Larger build projects: mini kernel, mini database, mini container runtime |
48 |
research-papers | Annotated bibliography of foundational and recent papers |
49 |
glossary | Definitions of all technical terms used across the archive |
50 |
acronyms | Expanded form and context for every acronym used |
Quickstart Paths by Audience
Student (computer science undergraduate)
Start here to build foundational understanding before any specialization.
00-foundations— data representation, abstraction01-computer-history— context for why systems are designed as they are06-cpu-architecture— how the hardware you're programming actually works03-kernel-fundamentals— what the OS does and why07-process-management— how your program runs11-memory-management— where your variables live15-networking— how programs talk to each other49-glossary+50-acronyms— always open in a second tab
First project: 47-projects/mini-shell.md
Site Reliability Engineer (SRE)
SREs need to understand systems deeply enough to debug them under pressure.
03-kernel-fundamentals— know what the kernel is doing when things break07-process-management— processes, signals, and what kills them15-networking+16-tcp-ip-internals— most production failures are network failures23-observability— metrics, logs, traces: your primary tools24-debugging— strace, perf, core dumps25-performance-engineering— understand latency before optimizing it28-reliability-engineering— SLOs, postmortems, chaos40-failure-history— learn from famous outages20-containers+22-kubernetes-internals— your runtime environment39-large-scale-case-studies— how real systems at scale are designed
First project: 47-projects/observability-stack.md
Kernel Engineer
Kernel engineers need the deepest possible understanding of the OS internals.
00-foundations→06-cpu-architecture→33-hardware-architecture03-kernel-fundamentals→04-kernel-architecture05-boot-process07-process-management→08-threading-models→09-scheduling10-synchronization— critical for any kernel work11-memory-management— virtual memory, paging, allocators12-storage-systems→14-device-drivers15-networking(kernel networking path)41-modern-kernel-challenges— eBPF, io_uring, Rust in kernel44-rust-and-memory-safety27-kernel-exploits— know what you're defending against43-formal-verification— seL4 and why proofs matter
First project: 47-projects/loadable-kernel-module.md
Distributed Systems Engineer
03-kernel-fundamentals+15-networking+16-tcp-ip-internals— foundations10-synchronization— understand locks before you understand distributed locks17-distributed-systems— consistency, consensus, clocks18-database-internals— storage layer for distributed data39-large-scale-case-studies— Dynamo, Bigtable, Spanner, Kafka21-cloud-infrastructure+22-kubernetes-internals23-observability+28-reliability-engineering38-system-design— patterns and tradeoffs40-failure-history— real failure modes at scale
First project: 47-projects/distributed-kv-store.md
Security Researcher
00-foundations→06-cpu-architecture→33-hardware-architecture03-kernel-fundamentals→04-kernel-architecture11-memory-management— understand what exploits manipulate26-security— mitigations and threat models27-kernel-exploits— privilege escalation, heap exploitation30-compilers-and-linkers— understand ELF, linking, code layout37-browser-and-sandbox-architecture— sandboxes and escapes44-rust-and-memory-safety— memory safety by construction43-formal-verification— formal proof of security properties41-modern-kernel-challenges— modern attack surface
First project: 47-projects/exploit-lab.md
AI Infrastructure Engineer
06-cpu-architecture+33-hardware-architecture— understand your hardware31-gpu-systems— the compute substrate for AI11-memory-management— memory bandwidth is the bottleneck15-networking+16-tcp-ip-internals— high-speed interconnects (RDMA, InfiniBand)32-ai-infrastructure— training clusters, parameter servers, NCCL20-containers+22-kubernetes-internals— your deployment platform25-performance-engineering— profiling, roofline model, kernel fusion17-distributed-systems— distributed training is distributed systems29-runtime-systems— Python runtime, PyTorch internals, CUDA graphs21-cloud-infrastructure— cloud GPU instances, object storage for checkpoints
First project: 47-projects/distributed-training-pipeline.md
How Files Are Cross-Referenced
Every file in the archive uses a consistent cross-reference syntax:
- Forward reference:
[see: 11-memory-management/03-page-tables.md]— a topic covered later - Back reference:
[prerequisite: 06-cpu-architecture/02-cache-hierarchy.md]— required prior reading - Lateral reference:
[related: 25-performance-engineering/04-memory-profiling.md]— same depth, different domain
At the top of each file, a Prerequisites section lists what to read first. At the bottom, a Further reading section lists where to go next.
The DEPENDENCY-GRAPH.md file in this directory captures all these relationships in a queryable form.
Contribution Philosophy
This archive follows a set of guiding principles for all content:
Precision over simplicity. Simplified explanations that introduce misconceptions are actively harmful. Complexity should be acknowledged and unpacked, not hidden.
Production reality over idealized models. Where the textbook model diverges from how Linux, FreeBSD, or a real cloud provider actually works, document the real behavior. Cite kernel source code, not just papers.
Depth-first, not breadth-first. One section that goes 10 levels deep is more valuable than 10 sections that stay at the surface. Coverage gaps are acceptable; shallow coverage is not.
Cross-reference aggressively. No concept exists in isolation. Every file should link to the 3–5 most relevant files elsewhere in the archive.
Cite primary sources. Papers, kernel commits, RFC numbers, CVE IDs, postmortem URLs. Assertions without sources are proposals, not knowledge.
Avoid recency bias. A 1974 paper can be more important than a 2024 blog post. Include historical context. Document why designs evolved.