Section 47: Projects — Overview
Purpose and Scope
This section catalogs substantive project-based learning exercises that take systems understanding to depth that labs and reading cannot reach alone. Where labs are focused exercises measured in hours, projects are sustained efforts measured in weeks or months, producing artifacts that demonstrate comprehensive understanding of a system domain. The project catalog spans six complexity tiers and twelve domains: toy kernels, toy filesystems, toy schedulers, toy hypervisors, toy distributed systems, eBPF tool collections, performance engineering, security research, networking, database storage, compiler backends, and cloud infrastructure.
The project philosophy is that building a toy version of a production system is the most efficient path to understanding that production system. A developer who has written a 2,000-line toy kernel understands interrupt handling, context switching, and virtual memory in a way that no amount of reading can replicate. The toy is not the goal; the mental model the toy builds is.
Prerequisites
- Completion of the relevant lab series from Section 46
- Programming proficiency in C, Rust, or Go (depending on project track)
- Section-specific prerequisites noted per project
Learning Objectives
Upon completing this section, the reader will be able to:
- Select an appropriate project for their current skill level and learning objectives
- Scope a project to a size that is completable in their available time while remaining educationally valuable
- Apply the toy-first methodology: build the simplest correct version first, then extend
- Recognize which production code corresponds to each component they built in their toy
- Use their toy project as a platform for further experimentation and research
Architecture Overview
PROJECT COMPLEXITY TIERS
===========================
TIER 1 (1-2 weeks) TIER 2 (2-4 weeks)
───────────────── ──────────────────
Hello-world kernel Toy scheduler (CFS-like)
Simple FUSE filesystem Toy memory allocator
Vector clock library Toy lock implementation
eBPF tracer tool TCP state machine
TIER 3 (4-8 weeks) TIER 4 (8-16 weeks)
────────────────── ───────────────────
Toy x86 kernel Toy ext2 filesystem
User-space thread lib Raft implementation
Simple JIT compiler Toy hypervisor (KVM-based)
Performance analysis tool Toy B-tree database
TIER 5 (16-32 weeks) TIER 6 (6+ months)
──────────────────── ──────────────────
Toy OS with VFS Production-quality tool
Full Raft + KV store OSS contribution project
Toy compiler + linker Research prototype
Network stack in user space Publishable artifact
PROJECT COMPONENT MAPPING
============================
Toy kernel Production Linux
┌─────────────┐ ┌────────────────┐
│ boot stub │ ──> │ arch/x86/boot │
│ IDT setup │ ──> │ arch/x86/kernel/traps.c │
│ GDT/TSS │ ──> │ arch/x86/kernel/desc.c │
│ page tables │ ──> │ arch/x86/mm/ │
│ scheduler │ ──> │ kernel/sched/ │
│ syscall │ ──> │ arch/x86/entry/│
└─────────────┘ └────────────────┘
Key Concepts
- Toy-first principle: a minimal correct implementation is more educational than a partial implementation of a complex design; scope ruthlessly to completeness over features
- Vertical slice: implement one complete path end-to-end before adding breadth; in a kernel, implement one system call completely before adding others
- Test-driven development for systems: write tests before implementation; systems software has invariants that can be expressed as assertions; these become regression tests
- Reading production source: after building a toy, read the corresponding production code (Linux, Postgres, etcd); your mental model provides the scaffold for understanding the complexity
- Progressive enhancement: start with the simplest algorithm (round-robin, first-fit allocation, naive B-tree splits) and replace with more sophisticated approaches as understanding deepens
- Measurement-driven development: for performance projects, establish baselines before optimization; measure after each change; never optimize based on intuition alone
Project Catalog
Domain 1: Toy Kernel Projects
Project KERN-01: Bare Metal Hello World (Tier 1) - Boot an x86-64 machine (in QEMU) to a serial "Hello, World!" without any OS - Skills: Boot process, GDT, minimal BIOS/UEFI, VGA text mode or UART - Time: 1-2 weeks | Language: C + assembly
Project KERN-02: Interrupt and Timer Kernel (Tier 2) - Extend KERN-01: IDT setup, PIC initialization, timer interrupt at 100Hz, keyboard interrupt - Skills: Interrupt descriptor table, PIC, APIC basics - Time: 2-3 weeks | Language: C + assembly - Prerequisites: KERN-01
Project KERN-03: Virtual Memory Kernel (Tier 3) - Extend KERN-02: 4-level paging, virtual address space, physical page allocator (buddy system) - Skills: Page table manipulation, TLB, physical memory management - Time: 4-6 weeks | Language: C - Prerequisites: KERN-02
Project KERN-04: Process Kernel (Tier 3-4) - Extend KERN-03: process structure, fork/exec (minimal), round-robin scheduler, system calls - Skills: TSS, context switch, process lifecycle, system call convention - Time: 6-10 weeks | Language: C - Prerequisites: KERN-03
Project KERN-05: Networked OS Kernel (Tier 5) - Extend KERN-04: VFS layer, ext2 read-only, basic TCP/IP stack (ARP, IP, TCP), shell - Skills: Full OS architecture integration - Time: 6+ months | Language: C - Prerequisites: KERN-04, FS-01, NET-01 - Inspiration: XV6, ToAruOS, OSdev wiki
Domain 2: Toy Filesystem Projects
Project FS-01: Log-Structured Filesystem (Tier 3) - Implement an LFS-inspired log-structured filesystem over a file-backed block device - Features: Sequential writes, segment cleaning, inode map, checkpoint - Skills: Log-structured design, GC, crash recovery - Time: 4-6 weeks | Language: C or Rust
Project FS-02: B-tree Filesystem Index (Tier 4) - Implement a B+ tree index for directory lookups; support insertion, deletion, search, range scan - Skills: B-tree algorithms, page management, write-ahead logging for consistency - Time: 6-8 weeks | Language: C, Rust, or Go
Project FS-03: Copy-on-Write Filesystem (Tier 5) - Implement a CoW filesystem (Btrfs-inspired): CoW semantics, snapshot support, deduplication - Skills: CoW semantics, reference counting, snapshot management - Time: 12+ weeks
Domain 3: Toy Scheduler Projects
Project SCHED-01: Completely Fair Scheduler Clone (Tier 3) - Implement a CFS-like scheduler using a red-black tree; virtual runtime; niceness - Skills: Red-black tree, scheduling policy, vruntime calculation - Time: 3-4 weeks | Language: C
Project SCHED-02: Work-Stealing Scheduler (Tier 4) - Implement a multi-queue work-stealing scheduler for user-space threads - Skills: Deque-based work stealing, NUMA awareness, load balancing - Time: 4-6 weeks | Language: C or Rust
Project SCHED-03: Deadline Scheduler (Tier 4) - Implement EDF (Earliest Deadline First) scheduler for real-time tasks - Skills: Real-time scheduling theory, CBS (Constant Bandwidth Server), admission control - Time: 4-6 weeks
Domain 4: Toy Hypervisor Projects
Project HV-01: KVM-Based Minimal VMM (Tier 4) - Use Linux KVM API to launch a minimal guest VM; implement MMIO for serial output - Skills: KVM ioctls, VMCS, guest memory mapping, MMIO emulation - Time: 4-6 weeks | Language: C - References: kvmtool source, QEMU KVM backend
Project HV-02: Type-1 Hypervisor Prototype (Tier 5) - Implement a bare-metal hypervisor using Intel VT-x (VMXON/VMLAUNCH); support two guests - Skills: VMX instructions, VMCS fields, VM-exit handling, EPT - Time: 3+ months | Language: C + assembly
Domain 5: Toy Distributed System Projects
Project DIST-01: Raft Key-Value Store (Tier 4) - Implement Raft consensus; build a linearizable key-value store on top - Skills: Leader election, log replication, log compaction, snapshot, membership changes - Time: 6-10 weeks | Language: Go or Rust - Test suite: MIT 6.824 lab tests are freely available and comprehensive
Project DIST-02: Distributed Hash Table (Tier 4) - Implement Chord DHT: consistent hashing, finger table, join/leave/stabilize - Skills: Consistent hashing, ring topology, distributed lookup - Time: 4-6 weeks
Project DIST-03: Lamport/Vector Clock Visualizer (Tier 2) - Build an interactive visualization of Lamport and vector clock behavior across distributed nodes - Skills: Clock algorithms, causal ordering visualization - Time: 1-2 weeks | Language: any
Domain 6: eBPF Tool Collection
Project EBPF-01: Syscall Latency Histogram Tool (Tier 3) - Write an eBPF tool that measures per-syscall latency histograms system-wide - Skills: kprobes, BPF maps, histogram aggregation - Time: 1-2 weeks | Language: C + BPF, or Rust + libbpf
Project EBPF-02: TCP Connection Tracer (Tier 3) - Trace TCP connection establishment and teardown; emit connection tuples with latency - Skills: kprobes on TCP functions, BPF ring buffer, user-space consumer - Time: 2-3 weeks
Project EBPF-03: Memory Pressure Analyzer (Tier 4) - Track page reclaim events, OOM kill events, and swap activity; produce timeline - Skills: tracepoints, cgroup awareness, correlation - Time: 2-3 weeks
Project EBPF-04: Custom Network Policy Enforcer (Tier 4) - Implement a Kubernetes network policy enforcer using XDP and tc BPF - Skills: XDP, tc, BPF maps for policy state, packet classification - Time: 4-6 weeks
Domain 7: Performance Engineering Projects
Project PERF-01: Benchmark Framework (Tier 3) - Build a benchmark harness that controls for CPU affinity, NUMA, THP, and reports with statistical rigor - Skills: Statistical analysis, hardware performance counter collection, variance sources - Time: 2-3 weeks
Project PERF-02: Memory Allocator Comparison (Tier 3) - Benchmark jemalloc, tcmalloc, mimalloc, and ptmalloc2 across workload profiles; profile internals - Skills: Allocator architecture, fragmentation measurement, thread cache behavior - Time: 3-4 weeks
Project PERF-03: Lock Contention Analyzer (Tier 4) - Build a tool (eBPF or perf-based) that identifies hot locks and their contention patterns in production binaries - Skills: Lock profiling, futex tracing, call-site attribution - Time: 3-4 weeks
Domain 8: Security Research Projects
Project SEC-01: Kernel Exploit Development (Tier 5) - Develop a working exploit for a CVE-assigned kernel vulnerability in a VM; document techniques - Skills: Heap spray, ROP, kernel heap feng shui, KASLR bypass - Time: 4-8 weeks per CVE
Project SEC-02: Container Escape Analysis (Tier 4) - Survey container escape techniques; implement at least two in a controlled lab environment - Skills: Namespace escapes, cgroup escapes, kernel attack surface - Time: 3-4 weeks
Evaluation Criteria
All projects should be evaluated against four criteria:
- Correctness: does the implementation handle the core cases correctly? Are there known-failing edge cases?
- Understandability: is the code structured to make the design visible? Would a reader learn from it?
- Testability: are there tests that demonstrate correct behavior and will catch regressions?
- Documentation: is the design rationale documented? Are major design decisions explained?
File Map
47-projects/
├── 00-overview.md ← This file
├── 01-project-philosophy.md
├── 02-toy-kernel-projects.md
├── 03-toy-filesystem-projects.md
├── 04-toy-scheduler-projects.md
├── 05-toy-hypervisor-projects.md
├── 06-toy-distributed-system-projects.md
├── 07-ebpf-project-collection.md
├── 08-performance-engineering-projects.md
├── 09-security-research-projects.md
├── 10-networking-projects.md
├── 11-database-storage-projects.md
└── 12-evaluation-and-review-criteria.md
Cross-References
- Section 45 (Learning Roadmaps): projects are the advanced phase of each track
- Section 46 (Labs): labs are prerequisite exercises for most projects
- Section 48 (Research Papers): papers that provide the design basis for each project
- Section 19 (Virtualization): background for hypervisor projects
- Section 13 (Filesystems): background for filesystem projects
- Section 17 (Distributed Systems): background for distributed projects