Systems Knowledge Archive

Vision Statement

This archive is a comprehensive, deeply cross-referenced technical knowledge base covering the full vertical stack of modern computing systems — from transistor physics and CPU microarchitecture at the bottom, through operating system internals, to distributed systems, cloud infrastructure, and AI accelerator pipelines at the top. It is built on the conviction that a practitioner who understands the entire stack — not just their slice of it — writes software that is faster, safer, more reliable, and easier to reason about under failure.

The archive is not a tutorial series. It is a reference corpus: organized for deep reading, structured for dependency-aware navigation, and written to be technically precise. Every claim is grounded in how real systems actually work. Where theory diverges from production reality, production reality is documented.

How to Use This Archive

Reading Modes

Linear reading — follow the numbered sections 00 through 50. This is the most complete path and is recommended for anyone starting without a strong background. Sections are numbered to respect a natural dependency order: later sections assume knowledge from earlier ones.

Audience-targeted reading — use the quickstart paths below to skip directly to what matters for your role. Each path selects a subset of sections, ordered for that audience's needs.

Cross-reference reading — use TOPIC-HIERARCHY.md to find a specific concept, then follow internal cross-references ([see: section/file]) to related material. Use DEPENDENCY-GRAPH.md to understand what you need to know before reading a given topic.

Project-driven reading — sections 46-labs and 47-projects contain hands-on exercises. Each exercise lists the prerequisite sections. Start a project, get stuck, find the relevant section, go deep.

File Naming Convention

Within each section directory, files follow this naming pattern:

NN-section-name/
  overview.md          — section summary and reading guide
  01-topic-name.md     — individual deep-dive files
  02-topic-name.md
  ...
  exercises.md         — hands-on exercises for this section
  references.md        — papers, books, code repositories

Cross-references between files use the format [see: 03-kernel-fundamentals/02-syscall-interface.md].

Directory Structure

All 51 sections, numbered 00–50:

Section	Name	Description
`00`	foundations	Boolean logic, binary arithmetic, data representation, abstraction layers
`01`	computer-history	History from Babbage to Turing to UNIX to modern cloud
`02`	operating-system-history	Evolution of OS design: batch → time-sharing → microkernel → exokernel → unikernel
`03`	kernel-fundamentals	What a kernel is, system call interface, privilege levels, kernel vs. userspace
`04`	kernel-architecture	Monolithic, microkernel, hybrid, exokernel architectures with real examples
`05`	boot-process	BIOS/UEFI, bootloaders, early kernel init, initramfs, systemd
`06`	cpu-architecture	ISAs, pipelines, superscalar execution, branch prediction, caches, NUMA
`07`	process-management	Process creation, exec, fork, COW, process lifecycle, namespaces
`08`	threading-models	Kernel threads, user threads, M:N threading, POSIX threads, goroutines
`09`	scheduling	CFS, real-time schedulers, multicore scheduling, work stealing
`10`	synchronization	Locks, spinlocks, mutexes, semaphores, RCU, lock-free data structures
`11`	memory-management	Virtual memory, paging, page tables, TLB, page faults, slab allocator, OOM
`12`	storage-systems	HDDs, SSDs, NVMe, RAID, storage I/O path, io_uring
`13`	filesystems	VFS, ext4, XFS, btrfs, ZFS, tmpfs, overlayfs, FUSE
`14`	device-drivers	Driver model, interrupts, DMA, character and block drivers
`15`	networking	Network stack layers, socket API, packet lifecycle, kernel networking
`16`	tcp-ip-internals	TCP state machine, congestion control, BBR, QUIC, kernel TCP stack
`17`	distributed-systems	Consistency models, consensus (Raft/Paxos), distributed clocks, CRDTs
`18`	database-internals	B-trees, LSM trees, WAL, MVCC, query execution, buffer pool management
`19`	virtualization	Type 1/2 hypervisors, KVM, QEMU, hardware assist (VT-x, AMD-V), paravirt
`20`	containers	cgroups, namespaces, overlayfs, OCI runtime, runc, containerd
`21`	cloud-infrastructure	AWS/GCP/Azure internals, VPCs, object storage, managed services
`22`	kubernetes-internals	etcd, API server, scheduler, kubelet, CNI, CSI, CRI
`23`	observability	Metrics, logs, traces, eBPF-based observability, OpenTelemetry
`24`	debugging	gdb, lldb, perf, strace, core dumps, kernel debugging, crash analysis
`25`	performance-engineering	Profiling, flamegraphs, benchmarking methodology, latency analysis
`26`	security	Threat modeling, exploit mitigations, ASLR, SMEP/SMAP, seccomp, AppArmor
`27`	kernel-exploits	Privilege escalation, heap sprays, ROP chains, kernel CVE case studies
`28`	reliability-engineering	SLOs, error budgets, chaos engineering, postmortem culture
`29`	runtime-systems	GC algorithms, JVM internals, V8, CPython internals, memory safety runtimes
`30`	compilers-and-linkers	Compiler pipeline, LLVM, linking, ELF format, dynamic linking, JIT
`31`	gpu-systems	GPU architecture, CUDA, OpenCL, GPU memory hierarchy, PCIe bandwidth
`32`	ai-infrastructure	Training clusters, parameter servers, NCCL, model serving, inference optimization
`33`	hardware-architecture	CPUs (x86, ARM), FPGAs, ASICs, system buses, memory controllers
`34`	embedded-systems	Bare-metal programming, RTOS, HAL, firmware, peripheral interfaces
`35`	real-time-systems	Hard vs. soft RT, PREEMPT_RT, priority inversion, deadline scheduling
`36`	mobile-operating-systems	Android/iOS architecture, Binder IPC, power management, app sandbox
`37`	browser-and-sandbox-architecture	Browser process model, V8, Blink, sandboxing, site isolation
`38`	system-design	Scalability patterns, CAP theorem tradeoffs, design interview patterns
`39`	large-scale-case-studies	Google Bigtable, Amazon Dynamo, Facebook TAO, Kafka, Spanner, Cassandra
`40`	failure-history	Famous outages: AWS US-EAST-1, Cloudflare BGP, Facebook October 2021, Therac-25
`41`	modern-kernel-challenges	eBPF, io_uring, rust-in-kernel, kernel security hardening, Landlock
`42`	future-of-operating-systems	Unikernels, library OSes, capability-based security, hardware enclaves
`43`	formal-verification	TLA+, Coq, seL4 proof, model checking, why proofs matter in real systems
`44`	rust-and-memory-safety	Ownership model, borrow checker, unsafe Rust, Rust in Linux, memory safety in systems
`45`	learning-roadmaps	Curated learning paths by role and experience level
`46`	labs	Hands-on exercises with step-by-step instructions
`47`	projects	Larger build projects: mini kernel, mini database, mini container runtime
`48`	research-papers	Annotated bibliography of foundational and recent papers
`49`	glossary	Definitions of all technical terms used across the archive
`50`	acronyms	Expanded form and context for every acronym used

Quickstart Paths by Audience

Student (computer science undergraduate)

Start here to build foundational understanding before any specialization.

00-foundations — data representation, abstraction
01-computer-history — context for why systems are designed as they are
06-cpu-architecture — how the hardware you're programming actually works
03-kernel-fundamentals — what the OS does and why
07-process-management — how your program runs
11-memory-management — where your variables live
15-networking — how programs talk to each other
49-glossary + 50-acronyms — always open in a second tab

First project: 47-projects/mini-shell.md

Site Reliability Engineer (SRE)

SREs need to understand systems deeply enough to debug them under pressure.

03-kernel-fundamentals — know what the kernel is doing when things break
07-process-management — processes, signals, and what kills them
15-networking + 16-tcp-ip-internals — most production failures are network failures
23-observability — metrics, logs, traces: your primary tools
24-debugging — strace, perf, core dumps
25-performance-engineering — understand latency before optimizing it
28-reliability-engineering — SLOs, postmortems, chaos
40-failure-history — learn from famous outages
20-containers + 22-kubernetes-internals — your runtime environment
39-large-scale-case-studies — how real systems at scale are designed

First project: 47-projects/observability-stack.md

Kernel Engineer

Kernel engineers need the deepest possible understanding of the OS internals.

00-foundations → 06-cpu-architecture → 33-hardware-architecture
03-kernel-fundamentals → 04-kernel-architecture
05-boot-process
07-process-management → 08-threading-models → 09-scheduling
10-synchronization — critical for any kernel work
11-memory-management — virtual memory, paging, allocators
12-storage-systems → 14-device-drivers
15-networking (kernel networking path)
41-modern-kernel-challenges — eBPF, io_uring, Rust in kernel
44-rust-and-memory-safety
27-kernel-exploits — know what you're defending against
43-formal-verification — seL4 and why proofs matter

First project: 47-projects/loadable-kernel-module.md

Distributed Systems Engineer

03-kernel-fundamentals + 15-networking + 16-tcp-ip-internals — foundations
10-synchronization — understand locks before you understand distributed locks
17-distributed-systems — consistency, consensus, clocks
18-database-internals — storage layer for distributed data
39-large-scale-case-studies — Dynamo, Bigtable, Spanner, Kafka
21-cloud-infrastructure + 22-kubernetes-internals
23-observability + 28-reliability-engineering
38-system-design — patterns and tradeoffs
40-failure-history — real failure modes at scale

First project: 47-projects/distributed-kv-store.md

Security Researcher

00-foundations → 06-cpu-architecture → 33-hardware-architecture
03-kernel-fundamentals → 04-kernel-architecture
11-memory-management — understand what exploits manipulate
26-security — mitigations and threat models
27-kernel-exploits — privilege escalation, heap exploitation
30-compilers-and-linkers — understand ELF, linking, code layout
37-browser-and-sandbox-architecture — sandboxes and escapes
44-rust-and-memory-safety — memory safety by construction
43-formal-verification — formal proof of security properties
41-modern-kernel-challenges — modern attack surface

First project: 47-projects/exploit-lab.md

AI Infrastructure Engineer

06-cpu-architecture + 33-hardware-architecture — understand your hardware
31-gpu-systems — the compute substrate for AI
11-memory-management — memory bandwidth is the bottleneck
15-networking + 16-tcp-ip-internals — high-speed interconnects (RDMA, InfiniBand)
32-ai-infrastructure — training clusters, parameter servers, NCCL
20-containers + 22-kubernetes-internals — your deployment platform
25-performance-engineering — profiling, roofline model, kernel fusion
17-distributed-systems — distributed training is distributed systems
29-runtime-systems — Python runtime, PyTorch internals, CUDA graphs
21-cloud-infrastructure — cloud GPU instances, object storage for checkpoints

First project: 47-projects/distributed-training-pipeline.md

How Files Are Cross-Referenced

Every file in the archive uses a consistent cross-reference syntax:

Forward reference: [see: 11-memory-management/03-page-tables.md] — a topic covered later
Back reference: [prerequisite: 06-cpu-architecture/02-cache-hierarchy.md] — required prior reading
Lateral reference: [related: 25-performance-engineering/04-memory-profiling.md] — same depth, different domain

At the top of each file, a Prerequisites section lists what to read first. At the bottom, a Further reading section lists where to go next.

The DEPENDENCY-GRAPH.md file in this directory captures all these relationships in a queryable form.

Contribution Philosophy

This archive follows a set of guiding principles for all content:

Precision over simplicity. Simplified explanations that introduce misconceptions are actively harmful. Complexity should be acknowledged and unpacked, not hidden.

Production reality over idealized models. Where the textbook model diverges from how Linux, FreeBSD, or a real cloud provider actually works, document the real behavior. Cite kernel source code, not just papers.

Depth-first, not breadth-first. One section that goes 10 levels deep is more valuable than 10 sections that stay at the surface. Coverage gaps are acceptable; shallow coverage is not.

Cross-reference aggressively. No concept exists in isolation. Every file should link to the 3–5 most relevant files elsewhere in the archive.

Cite primary sources. Papers, kernel commits, RFC numbers, CVE IDs, postmortem URLs. Assertions without sources are proposals, not knowledge.

Avoid recency bias. A 1974 paper can be more important than a 2024 blog post. Include historical context. Document why designs evolved.