Section 38: System Design
Purpose and Scope
System design is the discipline of architecting large-scale distributed systems that meet functional requirements while satisfying non-functional constraints: availability, scalability, consistency, latency, throughput, and cost. Unlike algorithm design — where a provably correct answer exists — system design involves navigating fundamental tradeoffs, making explicit decisions about consistency models, failure domains, and operational complexity. This section covers the methodology for approaching system design problems, the canonical patterns and their tradeoffs, and worked examples of major system categories. It serves both as an interview preparation resource and as a practical reference for engineers designing production systems.
Prerequisites
- Distributed systems fundamentals (Section 17): CAP theorem, consistency models, consensus
- Database internals (Section 18): storage engines, replication, transactions
- Networking and protocols (Section 15, 16): TCP, HTTP, load balancing
- Operating system and Linux fundamentals (Section 03)
- Basic understanding of cloud infrastructure primitives (Section 21)
Learning Objectives
By the end of this section, you will be able to:
- Apply a structured methodology to decompose any system design problem
- Select the appropriate consistency model (strong, eventual, causal) for a given use case
- Design a scalable caching strategy with appropriate eviction, invalidation, and freshness policies
- Design a horizontally scalable database layer using replication and sharding
- Explain consistent hashing and design a routing layer using it
- Apply the circuit breaker and retry patterns to build fault-tolerant service architectures
- Design major systems from scratch: URL shortener, distributed cache, search engine, social feed
- Reason about system tradeoffs quantitatively using back-of-envelope estimation
Architecture Overview
System Design Methodology
Step 1: Requirements Clarification (5 min)
+-----------------------------------------+
| Functional: What does the system do? |
| Users, operations, data flows |
| Non-functional: Scale, latency, avail. |
| QPS, DAU, SLA, data volume, geo |
| Out of scope: explicitly state |
+-----------------------------------------+
Step 2: Capacity Estimation (5 min)
+-----------------------------------------+
| Traffic: 100M DAU * 10 req/day = ~12K QPS|
| Storage: 1M posts/day * 1KB = 1 GB/day |
| Bandwidth: 12K QPS * 10KB = 120 MB/s |
+-----------------------------------------+
Step 3: High-Level Design (15 min)
+-----------------------------------------+
| Draw components: clients, LB, services, |
| caches, databases, queues |
| Identify read vs write paths |
| Define API contracts |
+-----------------------------------------+
Step 4: Deep Dive (20 min)
+-----------------------------------------+
| Bottleneck identification & resolution |
| Data model design |
| Consistency and availability tradeoffs |
| Failure modes and mitigations |
+-----------------------------------------+
Step 5: Wrap Up
+-----------------------------------------+
| Operational concerns, monitoring |
| Future scaling considerations |
| Known limitations |
+-----------------------------------------+
Scalability Patterns
Vertical Scaling: Horizontal Scaling:
+----------+ +------+ +------+ +------+
| Server | -> | Srv1 | | Srv2 | | Srv3 |
| (bigger) | +--+---+ +--+---+ +--+---+
+----------+ | | |
Simple, bounded +----+----+----+ |
| |
Load Balancer |
(L4 or L7) |
Database Scaling Patterns:
Read Replica: Sharding:
+----------+ +-------+ +-------+
| Primary | -> replication -> | Shard | | Shard |
| (writes) | +----------+ | 0 | | 1 |
+----------+ | Replica | | uid%2 | | uid%2 |
| (reads) | | ==0 | | ==1 |
+----------+ +-------+ +-------+
Caching Hierarchy:
Client <-> CDN <-> Load Balancer <-> App Server <-> Cache <-> DB
(edge cache) (in-process) (Redis)
Read-through: cache miss -> load from DB -> populate cache -> return
Write-through: write DB and cache simultaneously
Write-behind: write cache first, async flush to DB
Cache-aside: application manages cache explicitly
Consistent Hashing
Hash ring (0 to 2^32 - 1):
0
/ \
node_A node_D
/ \
node_B node_C
\ /
----- ring -----
Key lookup: hash(key) -> find nearest node clockwise
Adding a node: only keys between predecessor and new node move
Removing a node: only keys on that node move to successor
Virtual nodes: each physical server gets K virtual positions on ring
-> better load distribution, especially with heterogeneous nodes
Used by: Amazon DynamoDB, Apache Cassandra, Akamai CDN
Message Queue Architecture
Producer ---> +----------+ ---> Consumer Group A
(service) | Kafka | (real-time processing)
(writes) | Partition|
| P0: msgs |
| P1: msgs | ---> Consumer Group B
| P2: msgs | (analytics / batch)
+----------+
Key properties:
- Decouples producers from consumers (async)
- Durable: messages persisted to disk (Kafka: 7-day retention)
- Replay: consumers can re-read from any offset
- Fan-out: multiple consumer groups independently consume
Pattern selection:
Request-reply (synchronous): direct HTTP/gRPC
Fire-and-forget (async, durable): Kafka, SQS
Work queue (competing consumers): RabbitMQ, SQS FIFO
Event streaming (ordered log): Kafka, Kinesis
Social Feed Design (Fan-out)
User A posts a tweet (100 followers)
Fan-out on Write (push model):
Post -> for each follower: write post_id to follower's feed cache
Read feed: O(1) - just read pre-computed cache
Write cost: O(followers) - expensive for celebrities (100M followers)
Fan-out on Read (pull model):
Post -> write once to user's post store
Read feed: O(following) - query each followed user's recent posts, merge
Write cost: O(1)
Read cost: expensive for users following thousands
Hybrid (Twitter/X approach):
Celebrities (>X followers): fan-out on read for their posts
Regular users: fan-out on write
Read path: merge pre-computed fan-out cache + celebrity posts
+----------+ +-----------+
| Post | ------> | Fan-out | -----> Redis feed cache
| Service | | Service | per-user sorted set
+----------+ | (async, | (score = timestamp)
| Kafka) |
+-----------+
Key Concepts
- CAP Theorem: In a distributed system, you can guarantee at most two of: Consistency, Availability, Partition Tolerance. Since partitions are unavoidable in real networks, the practical choice is CP (consistent but may reject requests) vs AP (always responds but may return stale data).
- Consistent Hashing: A technique for distributing load across nodes such that only O(1/n) keys need to move when a node is added or removed. Essential for distributed caches and key-value stores with dynamic membership.
- Circuit Breaker: Pattern that stops calling a failing service after a threshold of failures, returning an error immediately. After a timeout, allows a probe request to test recovery. Prevents cascade failures.
- Rate Limiting: Controlling request rates per user/service. Token bucket (steady state with burst capacity) and leaky bucket (strict rate) algorithms. Implemented at API gateway or in dedicated rate-limiting service (Redis INCR + TTL).
- CQRS (Command Query Responsibility Segregation): Separate read and write models. Write path updates the event log or primary DB; read path maintains denormalized projections optimized for queries. Enables independent scaling of read and write paths.
- Saga Pattern: Manages distributed transactions across microservices using a sequence of local transactions with compensating transactions on failure. Either orchestrator-based or choreography-based (event-driven).
- Service Discovery: Dynamic registration and lookup of service instances. Client-side (Consul + client library) or server-side (load balancer queries registry). Required for auto-scaled microservice environments.
- Back-of-Envelope Estimation: Quantitative sizing of system requirements using approximations. Key numbers: HDD 100 MB/s, SSD 500 MB/s-7 GB/s, RAM 50 GB/s, network 10 Gbps; L1 ~1ns, RAM ~100ns, SSD ~100 microseconds, network RTT 10-100ms.
Major Historical Milestones
| Year | Milestone |
|---|---|
| 1997 | Amazon begins decomposing from monolith; eventually services |
| 1997 | Consistent hashing paper — Karger et al. (Akamai/MIT) |
| 2003 | Google MapReduce paper — batch distributed computation |
| 2006 | Amazon Dynamo paper — AP key-value store, consistent hashing, vector clocks |
| 2007 | CAP theorem formally proven (Gilbert & Lynch, building on Brewer 2000) |
| 2007 | Twitter architecture — fail whale era; sharding, caching lessons |
| 2008 | Facebook Memcached at scale — cache-aside at billion-user scale |
| 2009 | Netflix begins cloud migration from on-premise; chaos engineering origin |
| 2011 | Netflix Hystrix — circuit breaker pattern popularized |
| 2011 | CQRS and Event Sourcing — Greg Young's formalization |
| 2012 | Lambda architecture — Marz's batch + speed + serving layers |
| 2014 | Microservices term coined — Fowler & Lewis article |
| 2014 | Kafka at LinkedIn — append-only log as distributed system primitive |
| 2015 | Kubernetes 1.0 — container orchestration standard |
| 2016 | Kappa architecture — stream-only, replacing lambda |
| 2018 | Stripe distributed systems reading list — formalization of field |
| 2020 | Service mesh (Istio, Linkerd) mainstreams for east-west traffic |
Modern Relevance
System design is the interview format for senior and staff engineering positions at virtually all major technology companies. But beyond interviews, the patterns are the vocabulary of how large systems are built and discussed. Every engineering design document at Google, Amazon, Meta, and Stripe invokes these concepts. The shift to cloud-native architectures (containers, Kubernetes, managed databases) has made horizontal scaling the default, not an exception. Rate limiting, circuit breaking, and consistent hashing are now implemented by platform teams and consumed via libraries and service meshes — but the engineers operating them must understand what these systems are doing to debug them effectively.
File Map
38-system-design/
├── 00-overview.md <- This file
├── 01-methodology.md
├── 02-scalability-patterns.md
├── 03-load-balancing.md
├── 04-caching-strategies.md
├── 05-cdn.md
├── 06-database-scaling.md
├── 07-message-queues.md
├── 08-event-driven-architecture.md
├── 09-microservices-vs-monolith.md
├── 10-api-gateway-patterns.md
├── 11-rate-limiting.md
├── 12-circuit-breaking.md
├── 13-service-discovery.md
├── 14-consistent-hashing.md
├── 15-cap-theorem-applied.md
├── 16-design-url-shortener.md
├── 17-design-distributed-cache.md
├── 18-design-search-engine.md
└── 19-design-social-feed.md
Cross-References
- Section 17 (Distributed Systems): Consensus (Raft, Paxos), consistency models, clock synchronization
- Section 18 (Database Internals): B-trees, LSM trees, MVCC, replication logs — why database scaling behaves as it does
- Section 21 (Cloud Infrastructure): AWS/GCP/Azure primitives that implement these patterns
- Section 22 (Kubernetes): Container orchestration as infrastructure for scalable service deployment
- Section 23 (Observability): Metrics, tracing, logging for production system health assessment
- Section 39 (Large-Scale Case Studies): Real implementations of these patterns at Google, Meta, Amazon