Caching Strategies: Trading Freshness for Speed

Overview

Caching is one of the most powerful tools for improving system performance and scalability. By storing the results of expensive operations (database queries, API calls, rendered pages, computed values) in faster storage, caching reduces latency, decreases load on downstream systems, and enables horizontal scaling of stateless services. Understanding caching deeply means understanding not just the happy path (cache hit), but the failure modes: stale data, thundering herds, cache invalidation, and the scenarios where caching makes things worse.

Phil Karlton's famous quip — "There are only two hard things in Computer Science: cache invalidation and naming things" — captures why caching is both universal and frequently the source of subtle bugs. The speed gains are real and significant; the correctness challenges are equally real.

Prerequisites

Understanding of read/write patterns and latency requirements
Familiarity with Redis and/or Memcached
Basic understanding of HTTP headers (Cache-Control, ETag, Last-Modified)
Awareness of database query patterns and the N+1 problem
Comfort with eventual consistency trade-offs

Cache Levels

Modern architectures employ caching at multiple levels, each with different characteristics:

REQUEST FLOW WITH ALL CACHE LAYERS:

Client Request
     |
     v
[CDN: Edge Cache]                   ← Geographic distribution
     | (miss)                         Static assets, full pages
     v                                TTL-based, global
[Reverse Proxy: Nginx/Varnish]      ← Server-side HTTP cache
     | (miss)                         Full HTTP responses
     v                                Shared across all users
[Application Servers]
     |
     +──→ [In-Process Cache]         ← Per-instance memory
     |    (JVM heap, Python dict)      Fastest access possible
     |    (miss)                       Not shared across instances
     |
     +──→ [Distributed Cache]        ← Shared across instances
     |    (Redis, Memcached)           sub-millisecond access
     |    (miss)                       Centralized, network hop
     |
     v
[Database]
     |
     +──→ [Query Result Cache]       ← DB-layer cache (MySQL query cache)
     |    (deprecated in MySQL 8)      Usually not recommended
     |
     +──→ [Buffer Pool / Page Cache] ← OS/DB memory cache
          (InnoDB buffer pool)         Pages from disk
          (Linux page cache)

LAYER CHARACTERISTICS:
Layer              Access time    Capacity      Consistency    Shared?
CDN               1-50ms         Unlimited     Low (TTL)      Global
Reverse proxy     0.1ms          High          Medium         Per-DC
In-process        0.001ms        Low (RAM)     High           No
Distributed cache 0.1-1ms        Medium        Medium         Yes
Database cache    1-100ms        High          High           Yes

Cache-Aside Pattern

Cache-aside (also called lazy loading) is the most common caching pattern. The application code manages the cache explicitly: check before reading from the source, populate after a miss.

CACHE-ASIDE READ:

  Application                Cache              Database
      |                        |                    |
      |─── GET user:123 ──────>|                    |
      |<── MISS ───────────────|                    |
      |                        |                    |
      |──────────────────────────────── SELECT * FROM users WHERE id=123 ──>|
      |<─────────────────────────────── {id:123, name:"Alice", ...} ────────|
      |                        |                    |
      |─── SET user:123 {data} TTL=300 ──>          |
      |                        |                    |
      (next request for user:123)
      |─── GET user:123 ──────>|                    |
      |<── HIT {data} ─────────|                    |
      |                        |                    |
      (no database call needed)

CACHE-ASIDE WRITE:

  Simple approach: invalidate cache on write
  Application                Cache              Database
      |                        |                    |
      |──────────────────────────────── UPDATE users SET name=... ─────────>|
      |<─────────────────────────────── OK ───────────────────────────────|
      |─── DEL user:123 ───────>|                    |
      |                        |                    |
  Next read will miss cache → fresh data fetched from DB

PSEUDOCODE:
  def get_user(user_id):
      key = f"user:{user_id}"
      data = cache.get(key)
      if data is None:                    # cache miss
          data = db.query("SELECT * FROM users WHERE id = ?", user_id)
          cache.set(key, data, ttl=300)   # populate cache, TTL 5 min
      return data

  def update_user(user_id, new_data):
      db.execute("UPDATE users SET ... WHERE id = ?", user_id, new_data)
      cache.delete(f"user:{user_id}")     # invalidate

Trade-offs: Cache-aside is resilient to cache failures — if the cache is down, the application falls back to the database transparently. The downside is that after a cache miss, there is a window where the cache is empty and many concurrent requests may all hit the database before the cache is populated (thundering herd — see below).

Read-Through Cache

In a read-through cache, the cache itself handles database reads on a miss. The application only talks to the cache, never directly to the database.

READ-THROUGH FLOW:

  Application ──→ Cache ──→ (on miss) ──→ Database
              ←──       ←──xxxxxxxxx ←──

  Application never directly queries the database.
  Cache is the single data access point for reads.

  Pros:
  - Application code simpler (no explicit cache miss handling)
  - Cache management centralized in cache layer

  Cons:
  - First request for any key always misses (cold start)
  - Cache failure means application failure (no transparent fallback)
  - Cache must understand how to query the database (coupling)

  Examples: AWS ElastiCache with DAX for DynamoDB (read-through)
            Ehcache (Java) with read-through loader configuration

Write-Through Cache

Write-through keeps the cache and database synchronized: every write updates both simultaneously.

WRITE-THROUGH FLOW:

  Application
      |
      v
  +--------+  synchronous write  +----------+
  |  Cache |────────────────────>| Database |
  +--------+                     +----------+

  Both writes complete before returning to the application.

  Pros:
  - Cache always current: reads after writes get fresh data
  - No stale data in cache

  Cons:
  - Write latency = cache latency + database latency (additive)
  - Every write goes to DB regardless of whether it will ever be read
  - Cache failure blocks writes

  Good for: read-heavy workloads where read freshness is critical
           and write latency is acceptable.

Write-Behind (Write-Back) Cache

Write-behind writes to the cache immediately and queues the database write asynchronously. Reads can be served from cache instantly.

WRITE-BEHIND FLOW:

  Application
      |
      |──→ Cache (instant, sync) ──→ [Write Queue] ──→ Database (async)
      |                                                  (seconds later)
      |
  (returns immediately after writing to cache)

  Pros:
  - Very low write latency (write to memory, return)
  - Can batch database writes (more efficient)
  - Database load smoothed out

  Cons:
  - DATA LOSS RISK: if cache crashes before async write completes,
    data is lost permanently
  - Complexity: requires reliable write queue (Kafka, persistent queue)
  - Reads may serve data not yet in database

  Use case: high-frequency counters (view counts, like counts)
            where approximate durability is acceptable.
  Avoid for: financial data, authentication, anything requiring
             strong durability guarantees.

Cache Eviction Policies

Caches have finite memory. When full, they must evict entries to make room for new ones. The eviction policy determines which entries are removed.

LRU (Least Recently Used):
  Evict the entry that was accessed least recently.

  Access history: A → B → C → A → D
  LRU order: D (newest), A, C, B (oldest)
  Next eviction: B

  Best for: temporal locality — recently accessed data tends to
            be accessed again soon. Works well for most caches.

  Implementation: doubly-linked list + hash map, O(1) operations.

LFU (Least Frequently Used):
  Evict the entry accessed least frequently overall.

  Counters: A=50, B=3, C=2, D=200
  Next eviction: C

  Best for: stable popularity distributions (some items always popular).
  Weakness: new items always have low frequency even if currently
            being accessed heavily. "Frequency recency" variants fix this.

TTL (Time-To-Live):
  Each entry expires after a fixed time, regardless of access.

  TTL = 300 seconds → entry created at T=0, evicted at T=300.

  Best for: data that becomes stale after a known period.
  Not an eviction policy alone: entries also need eviction when full.

  Short TTL → more fresh data, more DB load.
  Long TTL  → stale data risk, less DB load.

Random Eviction:
  Randomly select an entry to evict.
  Simple, surprisingly effective for many workloads.

Adaptive policies:
  ARC (Adaptive Replacement Cache): balances LRU and LFU automatically.
  CLOCK: approximation of LRU, lower memory overhead.
  TinyLFU (used by Caffeine, Java): frequency with recency boost.

Cache Invalidation: The Hard Problem

Cache invalidation is the problem of ensuring the cache does not serve stale data after the underlying data changes.

INVALIDATION STRATEGIES:

TTL-based (simplest):
  Set a short TTL, accept brief stale windows.
  Problem: stale window proportional to TTL.
  Good for: data where brief staleness is acceptable (product catalogs,
            user profiles, public content).

Event-driven invalidation:
  On database write, explicitly invalidate the relevant cache key.

  db.update("users", user_id, new_data)
  cache.delete(f"user:{user_id}")

  Problem: write-invalidate race condition.

  Time →  T1                  T2             T3
  Thread A: DB write → cache.delete
  Thread B:                cache.get(miss) → DB read → cache.set(STALE)

  Between T1 and T2, if Thread B reads from DB before Thread A has
  committed (if read is from replica with lag), B may cache stale data.

  Solution: short TTL even with event-based invalidation as backstop.

Write-through (as discussed above):
  Update cache and DB synchronously → cache always fresh.

Cache-bust versioning:
  Embed version/hash in cache key.

  key = f"user:{user_id}:v{user.updated_at}"

  On update, new key is used → old key naturally expires.
  Problem: old keys must be cleaned up. Works best for immutable objects.

Publication pattern:
  A change data capture (CDC) system (Debezium + Kafka) listens to
  the database WAL. On each change, emits an event. A cache consumer
  listens for relevant events and invalidates or updates cache.

  This is the most scalable approach for complex invalidation requirements.
  Used by: LinkedIn, Netflix, Airbnb.

Cache Stampede / Thundering Herd

A cache stampede occurs when many concurrent requests for the same key all miss the cache simultaneously, flooding the database.

STAMPEDE SCENARIO:

  Popular product page for "iPhone 15"
  Cached with TTL = 300 seconds.

  T=300: TTL expires. Cache entry deleted.
  T=300: 5,000 concurrent requests for iPhone 15 page arrive.
  T=300: ALL 5,000 miss the cache (key is gone).
  T=300: ALL 5,000 query the database simultaneously.
  T=300: Database overwhelmed → latency spikes → errors.
  T=301: 5,000 cache entries all set (redundantly).

SOLUTIONS:

1. Mutex/Lock (request coalescing):
   First request acquires lock, fetches from DB, populates cache.
   All other requests wait for the lock, then read from cache.

   Problem: all waiting requests are delayed. One slow DB query
            delays 5,000 requests.

   Improvement: background refresh. Return stale data immediately,
   trigger async refresh. Client gets slightly stale data, no wait.

2. Probabilistic Early Expiration (XFetch):
   Before the TTL expires, probabilistically refresh the cache early.

   Formula: refresh if current_time - ttl + β × delta × log(rand()) > expiry
   (where β is a tuning constant, delta is the time to compute the value)

   Result: cache refreshes in a distributed fashion before expiry.
   Stampede never occurs because the cache is never actually empty.

   Implemented in: Redis 7.4+ (probabilistic TTL), custom application code.

3. Staggered TTLs:
   Add jitter to TTL: TTL = base_ttl + random(-30, +30) seconds.
   Keys expire at different times → no synchronized mass expiry.
   Good for: bulk-loaded caches (loading 1M records all at T=0).

4. Hot Key Replication:
   Store popular keys in multiple locations: Redis Cluster with
   multiple replicas, or duplicate key across all Redis nodes.
   No single hot spot for high-traffic keys.

Redis vs Memcached

Both Redis and Memcached are in-memory distributed caches. Choosing between them:

COMPARISON:

Feature              Redis                   Memcached
---                  ---                     ---
Data structures      Rich: strings, lists,   Strings only (key-value)
                     hashes, sets, sorted
                     sets, streams, bitmaps

Persistence          RDB snapshots + AOF      None (pure cache)
                     write-ahead log

Replication          Primary-replica          No native replication

Clustering           Redis Cluster            Third-party clients only
                     (built-in sharding)

Pub/Sub              Yes (built-in)           No

Scripting            Lua scripts, EVAL        No

Memory usage         Higher (data structure   Lower (simple values)
                     overhead)

Threading            Single-threaded          Multi-threaded
                     (I/O multi-threaded
                      since Redis 6.0)

TTL granularity      Millisecond              Second

Use cases:           Session storage,          Pure caching when
                     rate limiting,            maximum throughput
                     real-time leaderboards,   with simple values
                     pub/sub, queues,
                     distributed locks,
                     sorted sets for ranking

CHOOSE REDIS WHEN:
- You need data structures beyond simple get/set
- You want persistence or replication
- You need pub/sub, Lua scripting, or atomic operations
- You plan to use Redis for multiple purposes (cache + queue + session)

CHOOSE MEMCACHED WHEN:
- Pure caching workload with simple string values
- Maximum throughput with multi-threading is critical
- Running a legacy system already on Memcached
- You want the simplest possible cache layer

Cache Decision Diagram

CHOOSING A CACHE STRATEGY:

Is the data read-heavy (>10x more reads than writes)?
  NO  → Caching may not help much. Profile first.
  YES ↓

How stale can the data be?
  NEVER STALE → Write-through cache or always-read-from-DB
  TTL acceptable → Cache-aside with TTL
  Event-driven → Cache-aside + CDC invalidation

Is high write throughput required?
  YES → Write-behind (accept durability risk)
  NO  → Write-through or cache-aside with invalidation

Is the cache a shared resource or per-instance?
  SHARED → Redis (distributed cache)
  PER-INSTANCE → In-process cache (Caffeine, Guava)

Do you need complex data structures (sets, sorted sets)?
  YES → Redis
  NO  → Memcached or Redis (either works)

Is there a "thundering herd" risk for popular cold keys?
  YES → Add mutex, probabilistic expiry, or background refresh
  NO  → Basic TTL-based eviction sufficient

Is the data geographically distributed?
  YES → CDN cache at the edge (if static or semi-static)
  NO  → Distributed cache in single region sufficient

Production Example: Facebook's Memcached Architecture

Facebook's 2013 NSDI paper "Scaling Memcache at Facebook" is the canonical reference for large-scale caching. Key insights:

Leases: Facebook extended Memcached with a "lease" mechanism to solve the thundering herd problem. When a client gets a cache miss, Memcached issues a lease token. Only the token holder can write the value back. Other clients asking for the same key during this period either wait or receive a flag to back off briefly.
Regional pools: Facebook organized Memcached into "clusters" and "regions." Within a cluster, any server can be asked for any key (consistent hashing). Across regions, data is replicated to enable reads from local region. This reduced cross-region latency but introduced replication lag.
Invalidation pipeline: MySQL triggers sent invalidation messages to a "mcsqueal" process via McRouter, which forwarded invalidations to all Memcached pools in all regions. This ensured consistency across the distributed cache.
Hot spots: Popular items (viral posts) could receive millions of requests per second. Solution: "local Memcached" — an in-process cache per app server for the hottest keys, preventing any single Memcached server from becoming a bottleneck.

Debugging Notes

Cache miss rate too high: If cache hit rate is below 80%, the TTL may be too short, the cache capacity too small (LRU evicting before re-access), or the key space too large/random. Profile access patterns before tuning.

Stale data in cache: Check whether all write paths invalidate the cache. A write path that was added after caching was implemented often forgets to invalidate. Use a short TTL as a safety net.

Memory fragmentation in Redis: Long-running Redis instances develop memory fragmentation. Monitor mem_fragmentation_ratio; if above 1.5, schedule a restart or use MEMORY PURGE command.

Cache hit on wrong data: Cache keys must be unique. user:profile is not unique if there are multiple apps using the same Redis instance. Namespace your keys: app_name:entity_type:entity_id.

Redis connection pool exhaustion: Each application thread may hold a Redis connection. At scale (100 app servers × 20 threads = 2000 connections), this can saturate Redis. Use connection pooling and consider Redis Cluster for connection distribution.

Security Implications

Cache poisoning: An attacker who can write to the cache can poison it with malicious data. Ensure cache write access is restricted to application services. Never allow external/untrusted code to write to the cache.
Sensitive data in cache: Caching user data, session tokens, and PII requires the same security controls as databases. Encrypt at rest (Redis 7+ supports transparent encryption), use TLS in transit.
Side-channel timing attacks: Cache hit vs miss timing differences can leak information about other users' data (timing side channel). At very high security requirements, add artificial timing jitter.
Eviction of security-critical keys: A malicious client filling the cache can evict security-critical entries (e.g., session tokens in a cache-based session store). Separate security-critical data into a dedicated cache with protected eviction policy (maxmemory-policy = noeviction for sessions).

Performance Implications

In-process cache latency: 0.001ms. The fastest possible. Use for small, frequently accessed, shared-read data (config values, schema definitions).
Redis latency: Typically 0.1-1ms on localhost, 1-5ms over a LAN. For a 100ms request, Redis adds ~1-5% overhead — acceptable.
CDN cache: 1-50ms depending on edge proximity vs origin pull. For a 200ms page, edge cache reduces to <10ms for cached content.
Cache compression: Large values can be compressed (gzip, snappy) before caching. Reduces memory, increases CPU. Beneficial when values are >4 KB and memory is constrained.

Failure Modes

Cache as a dependency (not just an optimization): The most dangerous failure mode — when the cache is so load-bearing that the database cannot handle cache miss traffic. Always test behavior when the cache is completely unavailable. If the database cannot handle 100% of traffic, the system is fragile.

Negative caching ignored: Caching only positive results. A "user does not exist" lookup that is not cached will hit the database on every request (if the user ID is random/enumerated by an attacker, this is a DoS vector). Cache negative results with a short TTL.

Inconsistent invalidation: Multiple independent cache keys holding redundant views of the same data. Updating one and forgetting the other leads to inconsistency. Design clear key ownership: one canonical key per data entity.

Modern Usage

Redis 7+ introduced several caching improvements: Redis Functions (replacing Lua scripts), probabilistic data structures (HyperLogLog, Bloom filters), and a time series data type. Redis Stack adds RedisJSON, RediSearch (full-text search over cache), and Redis Gears for stream processing.

Distributed caching is increasingly deployed as a sidecar (Envoy proxy's HTTP cache filter) or at the service mesh level (Istio response caching), reducing the burden on application developers to manage caching logic explicitly.

Future Directions

CacheLib (Meta's open-source unified cache): A caching engine designed to span DRAM, NAND flash SSD, and NVMe, enabling cache tiers that use inexpensive flash for less-hot data while keeping hot data in DRAM.

RDMA-based distributed caches: Using Remote Direct Memory Access (RDMA) to access remote cache data with sub-100 microsecond latency, approaching local memory speeds. Research implementations at Microsoft and Google.

Semantic caching for LLM responses: Caching LLM API results not by exact prompt match but by semantic similarity — if two questions are semantically identical, serve the cached answer. Reduces inference cost for similar repeated queries.

Exercises

Implement a cache-aside pattern with TTL in Python using a dict as the cache. Then extend it to handle the thundering herd problem with a threading.Lock.
A database table has 1M rows, read 10,000 times per second (aggregate). Each row is 1 KB. Should you cache individual rows or bulk query results? What eviction policy is appropriate? Estimate cache size required for an 80% hit rate.
Design the caching strategy for an e-commerce product detail page that includes: product name/description (changes rarely), price (may change hourly), stock availability (changes frequently), user-specific price (personalized), and product reviews (append-only). Different TTLs for each data type.
Simulate a thundering herd: write a script that makes 100 concurrent requests for the same uncached key. Observe the behavior. Implement a mutex solution and compare the results.
Compare Redis and Memcached for the following use case: session storage for 10,000 concurrent users, each session is 4 KB, with 1,000 session reads/sec and 200 writes/sec. Which would you choose and why?

References

Kleppmann, M. Designing Data-Intensive Applications. O'Reilly, 2017. Chapter 5: Replication (discusses cache consistency).
Nishtala, R., et al. "Scaling Memcache at Facebook." NSDI, 2013.
Lamport, L. "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs." IEEE Transactions on Computers, 1979.
Vance, A. "Cache Invalidation: Real-World Strategies." InfoQ, 2019.
Redis documentation: https://redis.io/docs/
Fitzpatrick, B. "Distributed Caching with Memcached." Linux Journal, 2004.
Clements, P., et al. Documenting Software Architectures: Views and Beyond. 2nd ed., Addison-Wesley, 2010.