Skip to content

Systems Knowledge Archive

Vision Statement

This archive is a comprehensive, deeply cross-referenced technical knowledge base covering the full vertical stack of modern computing systems — from transistor physics and CPU microarchitecture at the bottom, through operating system internals, to distributed systems, cloud infrastructure, and AI accelerator pipelines at the top. It is built on the conviction that a practitioner who understands the entire stack — not just their slice of it — writes software that is faster, safer, more reliable, and easier to reason about under failure.

The archive is not a tutorial series. It is a reference corpus: organized for deep reading, structured for dependency-aware navigation, and written to be technically precise. Every claim is grounded in how real systems actually work. Where theory diverges from production reality, production reality is documented.


How to Use This Archive

Reading Modes

Linear reading — follow the numbered sections 00 through 50. This is the most complete path and is recommended for anyone starting without a strong background. Sections are numbered to respect a natural dependency order: later sections assume knowledge from earlier ones.

Audience-targeted reading — use the quickstart paths below to skip directly to what matters for your role. Each path selects a subset of sections, ordered for that audience's needs.

Cross-reference reading — use TOPIC-HIERARCHY.md to find a specific concept, then follow internal cross-references ([see: section/file]) to related material. Use DEPENDENCY-GRAPH.md to understand what you need to know before reading a given topic.

Project-driven reading — sections 46-labs and 47-projects contain hands-on exercises. Each exercise lists the prerequisite sections. Start a project, get stuck, find the relevant section, go deep.

File Naming Convention

Within each section directory, files follow this naming pattern:

NN-section-name/
  overview.md          — section summary and reading guide
  01-topic-name.md     — individual deep-dive files
  02-topic-name.md
  ...
  exercises.md         — hands-on exercises for this section
  references.md        — papers, books, code repositories

Cross-references between files use the format [see: 03-kernel-fundamentals/02-syscall-interface.md].


Directory Structure

All 51 sections, numbered 00–50:

Section Name Description
00 foundations Boolean logic, binary arithmetic, data representation, abstraction layers
01 computer-history History from Babbage to Turing to UNIX to modern cloud
02 operating-system-history Evolution of OS design: batch → time-sharing → microkernel → exokernel → unikernel
03 kernel-fundamentals What a kernel is, system call interface, privilege levels, kernel vs. userspace
04 kernel-architecture Monolithic, microkernel, hybrid, exokernel architectures with real examples
05 boot-process BIOS/UEFI, bootloaders, early kernel init, initramfs, systemd
06 cpu-architecture ISAs, pipelines, superscalar execution, branch prediction, caches, NUMA
07 process-management Process creation, exec, fork, COW, process lifecycle, namespaces
08 threading-models Kernel threads, user threads, M:N threading, POSIX threads, goroutines
09 scheduling CFS, real-time schedulers, multicore scheduling, work stealing
10 synchronization Locks, spinlocks, mutexes, semaphores, RCU, lock-free data structures
11 memory-management Virtual memory, paging, page tables, TLB, page faults, slab allocator, OOM
12 storage-systems HDDs, SSDs, NVMe, RAID, storage I/O path, io_uring
13 filesystems VFS, ext4, XFS, btrfs, ZFS, tmpfs, overlayfs, FUSE
14 device-drivers Driver model, interrupts, DMA, character and block drivers
15 networking Network stack layers, socket API, packet lifecycle, kernel networking
16 tcp-ip-internals TCP state machine, congestion control, BBR, QUIC, kernel TCP stack
17 distributed-systems Consistency models, consensus (Raft/Paxos), distributed clocks, CRDTs
18 database-internals B-trees, LSM trees, WAL, MVCC, query execution, buffer pool management
19 virtualization Type 1/2 hypervisors, KVM, QEMU, hardware assist (VT-x, AMD-V), paravirt
20 containers cgroups, namespaces, overlayfs, OCI runtime, runc, containerd
21 cloud-infrastructure AWS/GCP/Azure internals, VPCs, object storage, managed services
22 kubernetes-internals etcd, API server, scheduler, kubelet, CNI, CSI, CRI
23 observability Metrics, logs, traces, eBPF-based observability, OpenTelemetry
24 debugging gdb, lldb, perf, strace, core dumps, kernel debugging, crash analysis
25 performance-engineering Profiling, flamegraphs, benchmarking methodology, latency analysis
26 security Threat modeling, exploit mitigations, ASLR, SMEP/SMAP, seccomp, AppArmor
27 kernel-exploits Privilege escalation, heap sprays, ROP chains, kernel CVE case studies
28 reliability-engineering SLOs, error budgets, chaos engineering, postmortem culture
29 runtime-systems GC algorithms, JVM internals, V8, CPython internals, memory safety runtimes
30 compilers-and-linkers Compiler pipeline, LLVM, linking, ELF format, dynamic linking, JIT
31 gpu-systems GPU architecture, CUDA, OpenCL, GPU memory hierarchy, PCIe bandwidth
32 ai-infrastructure Training clusters, parameter servers, NCCL, model serving, inference optimization
33 hardware-architecture CPUs (x86, ARM), FPGAs, ASICs, system buses, memory controllers
34 embedded-systems Bare-metal programming, RTOS, HAL, firmware, peripheral interfaces
35 real-time-systems Hard vs. soft RT, PREEMPT_RT, priority inversion, deadline scheduling
36 mobile-operating-systems Android/iOS architecture, Binder IPC, power management, app sandbox
37 browser-and-sandbox-architecture Browser process model, V8, Blink, sandboxing, site isolation
38 system-design Scalability patterns, CAP theorem tradeoffs, design interview patterns
39 large-scale-case-studies Google Bigtable, Amazon Dynamo, Facebook TAO, Kafka, Spanner, Cassandra
40 failure-history Famous outages: AWS US-EAST-1, Cloudflare BGP, Facebook October 2021, Therac-25
41 modern-kernel-challenges eBPF, io_uring, rust-in-kernel, kernel security hardening, Landlock
42 future-of-operating-systems Unikernels, library OSes, capability-based security, hardware enclaves
43 formal-verification TLA+, Coq, seL4 proof, model checking, why proofs matter in real systems
44 rust-and-memory-safety Ownership model, borrow checker, unsafe Rust, Rust in Linux, memory safety in systems
45 learning-roadmaps Curated learning paths by role and experience level
46 labs Hands-on exercises with step-by-step instructions
47 projects Larger build projects: mini kernel, mini database, mini container runtime
48 research-papers Annotated bibliography of foundational and recent papers
49 glossary Definitions of all technical terms used across the archive
50 acronyms Expanded form and context for every acronym used

Quickstart Paths by Audience

Student (computer science undergraduate)

Start here to build foundational understanding before any specialization.

  1. 00-foundations — data representation, abstraction
  2. 01-computer-history — context for why systems are designed as they are
  3. 06-cpu-architecture — how the hardware you're programming actually works
  4. 03-kernel-fundamentals — what the OS does and why
  5. 07-process-management — how your program runs
  6. 11-memory-management — where your variables live
  7. 15-networking — how programs talk to each other
  8. 49-glossary + 50-acronyms — always open in a second tab

First project: 47-projects/mini-shell.md

Site Reliability Engineer (SRE)

SREs need to understand systems deeply enough to debug them under pressure.

  1. 03-kernel-fundamentals — know what the kernel is doing when things break
  2. 07-process-management — processes, signals, and what kills them
  3. 15-networking + 16-tcp-ip-internals — most production failures are network failures
  4. 23-observability — metrics, logs, traces: your primary tools
  5. 24-debugging — strace, perf, core dumps
  6. 25-performance-engineering — understand latency before optimizing it
  7. 28-reliability-engineering — SLOs, postmortems, chaos
  8. 40-failure-history — learn from famous outages
  9. 20-containers + 22-kubernetes-internals — your runtime environment
  10. 39-large-scale-case-studies — how real systems at scale are designed

First project: 47-projects/observability-stack.md

Kernel Engineer

Kernel engineers need the deepest possible understanding of the OS internals.

  1. 00-foundations06-cpu-architecture33-hardware-architecture
  2. 03-kernel-fundamentals04-kernel-architecture
  3. 05-boot-process
  4. 07-process-management08-threading-models09-scheduling
  5. 10-synchronization — critical for any kernel work
  6. 11-memory-management — virtual memory, paging, allocators
  7. 12-storage-systems14-device-drivers
  8. 15-networking (kernel networking path)
  9. 41-modern-kernel-challenges — eBPF, io_uring, Rust in kernel
  10. 44-rust-and-memory-safety
  11. 27-kernel-exploits — know what you're defending against
  12. 43-formal-verification — seL4 and why proofs matter

First project: 47-projects/loadable-kernel-module.md

Distributed Systems Engineer

  1. 03-kernel-fundamentals + 15-networking + 16-tcp-ip-internals — foundations
  2. 10-synchronization — understand locks before you understand distributed locks
  3. 17-distributed-systems — consistency, consensus, clocks
  4. 18-database-internals — storage layer for distributed data
  5. 39-large-scale-case-studies — Dynamo, Bigtable, Spanner, Kafka
  6. 21-cloud-infrastructure + 22-kubernetes-internals
  7. 23-observability + 28-reliability-engineering
  8. 38-system-design — patterns and tradeoffs
  9. 40-failure-history — real failure modes at scale

First project: 47-projects/distributed-kv-store.md

Security Researcher

  1. 00-foundations06-cpu-architecture33-hardware-architecture
  2. 03-kernel-fundamentals04-kernel-architecture
  3. 11-memory-management — understand what exploits manipulate
  4. 26-security — mitigations and threat models
  5. 27-kernel-exploits — privilege escalation, heap exploitation
  6. 30-compilers-and-linkers — understand ELF, linking, code layout
  7. 37-browser-and-sandbox-architecture — sandboxes and escapes
  8. 44-rust-and-memory-safety — memory safety by construction
  9. 43-formal-verification — formal proof of security properties
  10. 41-modern-kernel-challenges — modern attack surface

First project: 47-projects/exploit-lab.md

AI Infrastructure Engineer

  1. 06-cpu-architecture + 33-hardware-architecture — understand your hardware
  2. 31-gpu-systems — the compute substrate for AI
  3. 11-memory-management — memory bandwidth is the bottleneck
  4. 15-networking + 16-tcp-ip-internals — high-speed interconnects (RDMA, InfiniBand)
  5. 32-ai-infrastructure — training clusters, parameter servers, NCCL
  6. 20-containers + 22-kubernetes-internals — your deployment platform
  7. 25-performance-engineering — profiling, roofline model, kernel fusion
  8. 17-distributed-systems — distributed training is distributed systems
  9. 29-runtime-systems — Python runtime, PyTorch internals, CUDA graphs
  10. 21-cloud-infrastructure — cloud GPU instances, object storage for checkpoints

First project: 47-projects/distributed-training-pipeline.md


How Files Are Cross-Referenced

Every file in the archive uses a consistent cross-reference syntax:

  • Forward reference: [see: 11-memory-management/03-page-tables.md] — a topic covered later
  • Back reference: [prerequisite: 06-cpu-architecture/02-cache-hierarchy.md] — required prior reading
  • Lateral reference: [related: 25-performance-engineering/04-memory-profiling.md] — same depth, different domain

At the top of each file, a Prerequisites section lists what to read first. At the bottom, a Further reading section lists where to go next.

The DEPENDENCY-GRAPH.md file in this directory captures all these relationships in a queryable form.


Contribution Philosophy

This archive follows a set of guiding principles for all content:

Precision over simplicity. Simplified explanations that introduce misconceptions are actively harmful. Complexity should be acknowledged and unpacked, not hidden.

Production reality over idealized models. Where the textbook model diverges from how Linux, FreeBSD, or a real cloud provider actually works, document the real behavior. Cite kernel source code, not just papers.

Depth-first, not breadth-first. One section that goes 10 levels deep is more valuable than 10 sections that stay at the surface. Coverage gaps are acceptable; shallow coverage is not.

Cross-reference aggressively. No concept exists in isolation. Every file should link to the 3–5 most relevant files elsewhere in the archive.

Cite primary sources. Papers, kernel commits, RFC numbers, CVE IDs, postmortem URLs. Assertions without sources are proposals, not knowledge.

Avoid recency bias. A 1974 paper can be more important than a 2024 blog post. Include historical context. Document why designs evolved.