Database Scaling: Growing Beyond a Single Instance

Overview

Databases are almost always the first bottleneck in a scaling application. Application servers are stateless and trivially horizontal — add more instances behind a load balancer. Databases are stateful and fundamentally harder to scale. The data must be persisted, consistent, and accessible across all application instances, which creates coordination challenges that do not exist in stateless systems.

This file covers the primary techniques for scaling databases beyond a single instance: read replicas for read-heavy workloads, connection pooling to manage connection overhead, sharding for write-heavy or large-dataset workloads, CQRS for independent scaling of read and write models, polyglot persistence for using the right database for each job, and NewSQL systems that attempt to provide horizontal scaling with ACID guarantees.

Prerequisites

Understanding of relational database concepts: transactions, ACID, indexes, WAL
Familiarity with at least one RDBMS (PostgreSQL, MySQL)
Basic understanding of replication concepts
Awareness of the CAP theorem and consistency models
Understanding of connection pools and why they exist

Read Replicas

Read replicas are the most common first step in database scaling. A read replica is a copy of the primary database that receives updates via replication and serves read queries, offloading the primary.

REPLICA ARCHITECTURE:

  Application Servers
        |
  +-----------+--------+
  |           |        |
  Writes     Reads   Reads
  (primary) (replica)(replica)
  |           |        |
  v           v        v
[Primary DB]──WAL──>[Replica 1]
             ──WAL──>[Replica 2]

Replication mechanism (PostgreSQL):
  Primary writes to WAL (Write-Ahead Log)
  Replica continuously streams WAL from primary
  Replica applies WAL changes to its local state

  Streaming replication: near-real-time, typically <100ms lag
  Logical replication: can replicate subset of tables or with transformations

REPLICATION LAG:
  The time between a write to primary and its visibility on replicas.

  Typical: 10ms - 500ms under normal load
  High load: can grow to seconds or minutes

  Critical consequence: a "read-your-own-writes" inconsistency.

  T=0: User writes new profile photo
  T=0: Write goes to primary
  T=0.1: User immediately requests their profile
  T=0.1: Read goes to replica (which has 200ms lag)
  T=0.1: Replica does not yet have the new photo → user sees old photo

  Solutions:
  1. Route reads of recently written data to primary (sticky reads)
  2. Use causal consistency: tag writes with a "read-at-least-to" token,
     wait for replica to catch up before serving
  3. Short window after write: route reads to primary for 500ms,
     then to replicas

Read Replica Routing

Application code or a database proxy must route reads to replicas and writes to the primary:

ROUTING STRATEGIES:

Manual routing (application code):
  primary_conn.execute("INSERT ...")   # writes to primary
  replica_conn.execute("SELECT ...")   # reads from replica
  Problem: application must know topology; replica URLs are hardcoded.

Database proxy (ProxySQL for MySQL, PgBouncer for PostgreSQL):
  Application connects to proxy.
  Proxy inspects query: if SELECT → route to replica; if DML → route to primary.
  Application does not need to know replica topology.

  ProxySQL rule example:
  "^SELECT" → read hostgroup (replicas)
  "^(INSERT|UPDATE|DELETE)" → write hostgroup (primary)

  Works 90% of the time. Fails for:
  SELECT FOR UPDATE   → must go to primary (it's a write-intent read)
  SELECT inside txn   → must go to primary (transactional read)
  Stored procedures   → typically routed to primary for safety

Read-only connection flag:
  Most ORMs support specifying a read replica URL.
  ActiveRecord, SQLAlchemy, Hibernate: configure read-only and
  read-write connection pools separately.

Connection Pooling

Database connections are expensive: establishing a new TCP connection, performing TLS handshake, and PostgreSQL authentication takes 10-50ms. A PostgreSQL server process is spawned per connection, using ~5-10 MB of memory. At 200 connections, that is 1-2 GB just for connection overhead.

CONNECTION POOL ARCHITECTURE:

WITHOUT POOLER:
  100 app servers × 20 threads = 2,000 DB connections
  PostgreSQL max_connections = 200 (default)
  RESULT: connection exhaustion, connection refused errors

WITH PGBOUNCER:
  100 app servers × 20 threads → PgBouncer (1 host, 1 port)
  PgBouncer → [pool of 20 actual PostgreSQL connections]

  2,000 client connections → 20 server connections
  10x reduction in DB-side connection overhead

POOLING MODES:

Session pooling:
  One server connection assigned per client session.
  Client connects → gets dedicated server connection until disconnect.
  Savings minimal. Only useful for reducing connection startup cost.

Transaction pooling (most common):
  Server connection assigned per transaction.
  After COMMIT/ROLLBACK, connection returned to pool.
  Many clients share few server connections.
  Client: 2,000 connections. Server: 20-100 connections.

  Limitation: cannot use session-level features (SET, advisory locks,
  prepared statements across transactions) — they may execute on
  different connections.

Statement pooling (aggressive):
  Connection returned to pool after each statement.
  Maximum efficiency. Very limited SQL compatibility.
  Only used in highly controlled environments.

PGBOUNCER vs PROXYSQL:
  PgBouncer: PostgreSQL only. Lightweight (single process).
             Transaction pooling. Widely deployed.
  ProxySQL:  MySQL/MariaDB. Read-write splitting. Connection pooling.
             Query routing rules. Query rewriting. Richer feature set.

  Both: deploy as a sidecar or dedicated host. Application connects
        to pooler, not directly to database.

Database Sharding

Sharding (horizontal partitioning) distributes data across multiple database instances. Each shard holds a subset of the data. Together, all shards hold the complete dataset.

SHARDING ARCHITECTURE:

  Request for user_id=12345
          |
          v
  [Shard Router / Application Code]
          |
  hash(12345) % 4 = 1
          |
          v
  [Shard 1: user_ids 0-24999]   Contains user 12345
  [Shard 2: user_ids 25000-49999]
  [Shard 3: user_ids 50000-74999]
  [Shard 4: user_ids 75000-99999]

SHARDING STRATEGIES:

Hash sharding:
  shard_id = hash(shard_key) % num_shards

  Pros: even distribution (if key has uniform distribution)
  Cons: range queries span all shards; resharding is painful
        (adding a shard changes all hash assignments)

  Better: consistent hashing minimizes resharding impact.

Range sharding:
  Shard by value ranges: users A-F → shard 1; G-M → shard 2; etc.

  Pros: range queries stay within a single shard; easy to add shards
        at the high end of the range
  Cons: uneven distribution (hot spots) if some ranges are more popular

Directory sharding:
  A lookup table maps entity to shard.

  shard_map[user_id] → shard_id

  Pros: maximum flexibility; can move entities between shards
  Cons: lookup table is itself a single point of failure and
        a performance bottleneck; must be cached aggressively

Geographic sharding:
  Route by user's location: US users → shard in us-east;
  EU users → shard in eu-west.

  Pros: data residency compliance (GDPR); low latency for users
  Cons: cross-region users still need cross-region reads;
        uneven geographic distribution possible

Sharding Challenges

CROSS-SHARD QUERIES:
  "Find all users who signed up in the last 7 days"

  Without sharding: single query → done.
  With 4 shards: query all 4 shards → merge results → sort → return.

  Fan-out queries: N shards × query cost × aggregation overhead.
  Solution: maintain a separate reporting database (a replica that
            aggregates from all shards) for complex queries.

DISTRIBUTED TRANSACTIONS:
  User transfers $100 from account on Shard 1 to account on Shard 2.

  Naive approach: UPDATE shard1.accounts SET balance -= 100 WHERE id=A;
                  UPDATE shard2.accounts SET balance += 100 WHERE id=B;

  Problem: if Shard 1 UPDATE succeeds and Shard 2 UPDATE fails,
           you have debited without crediting. Data inconsistency.

  Solutions:
  1. Two-Phase Commit (2PC): coordinator tells all participants to
     prepare, then commits. Expensive, blocking, not widely used.
  2. Saga pattern: sequential local transactions with compensating
     transactions for rollback. Eventually consistent.
  3. Design to avoid cross-shard transactions: put related data
     on the same shard (user + their accounts on same shard).

RESHARDING PAIN:
  Adding a new shard requires migrating data.

  Old: 4 shards. New: 5 shards.
  hash(user_id) % 4 → different from hash(user_id) % 5.
  All data must be remapped. Downtime during migration or
  complex double-write migration period.

  Consistent hashing reduces resharding impact.
  Vitess (MySQL) and Citus (PostgreSQL) automate resharding.

Vitess and Citus

VITESS (MySQL sharding at scale):
  Originally built by YouTube to scale MySQL.
  Open source, CNCF graduated project.
  Used by: GitHub, Slack, PlanetScale.

  Provides: horizontal sharding, connection pooling (VTGate),
  query routing, online schema changes, resharding without downtime.

  Architecture:
  App → [VTGate: query routing] → [VTTablet: per-shard wrapper] → MySQL

  VTGate speaks MySQL protocol. App sees one logical database.
  VTGate routes queries to correct shard(s).
  Supports scatter-gather for cross-shard queries.

CITUS (PostgreSQL sharding):
  PostgreSQL extension that adds sharding.
  Native PostgreSQL protocol — any PostgreSQL client works.
  Acquired by Microsoft, built into Azure Database for PostgreSQL.

  Provides: distributed tables, reference tables (replicated to
  all shards), query push-down (push computation to each shard).

  Architecture: coordinator node (routes queries) + worker nodes (shards).

  Best for: multi-tenant SaaS (shard by tenant_id — all tenant's
  data co-located on one shard → no cross-shard queries).

CQRS Pattern

Command Query Responsibility Segregation separates the write model from the read model. Commands (writes) go through one path optimized for writes; queries (reads) go through a separate path optimized for reads.

CQRS ARCHITECTURE:

    User Action
        |
        |─── Write (Command)                  Read (Query)
        |          |                               |
        v          v                               v
   [Write API]  [Write DB]───(projection)───>[Read DB / Read Model]
   (optimized   (normalized,  ──────────────>(denormalized, optimized
    for ACID)   relational)                   for query patterns)

  Write DB: normalized relational model, ACID, optimized for writes
  Read DB: denormalized, potentially non-relational, optimized for reads

  Projection: process that transforms write model into read model.
              Runs asynchronously. Can use Kafka for event sourcing.

EXAMPLE: Social media feed
  Write model (PostgreSQL):
    posts(id, user_id, content, created_at)
    follows(follower_id, following_id)

  Naive read: "Get feed for user 123" = 
    SELECT posts.* FROM posts
    JOIN follows ON posts.user_id = follows.following_id
    WHERE follows.follower_id = 123
    ORDER BY created_at DESC LIMIT 20

    → Expensive JOIN across millions of rows, slow at scale.

  With CQRS:
    Read model (Redis):
    feed:user:123 = [post_id_1, post_id_2, ..., post_id_20] (sorted set)

    When user 456 posts: fanout to all followers' feed sorted sets.
    Read: ZREVRANGE feed:user:123 0 19 → instant.

    Independent scaling: write model scaled separately from read model.
    Read model can be Elasticsearch, Redis, Cassandra — best fit for query.

WHEN TO USE CQRS:
  - Read and write access patterns are very different
  - Read model requires denormalization that would compromise write model
  - Need to scale reads independently from writes
  - Different consistency requirements for reads vs writes

WHEN NOT TO USE CQRS:
  - Simple CRUD applications where read/write patterns are symmetric
  - Small teams that cannot afford the operational complexity
  - When eventual consistency in read model is unacceptable

Polyglot Persistence

Polyglot persistence is the practice of using different database technologies for different parts of the same application, choosing the database best suited to each specific use case.

POLYGLOT PERSISTENCE MAP:

Use Case                    Recommended Database          Why
-----------                 -------------------           ---
User accounts, orders       PostgreSQL / MySQL            ACID, relational
Full-text search            Elasticsearch / OpenSearch    Inverted index, rich query
Session storage             Redis                         Sub-ms, TTL support
Time series metrics         InfluxDB / TimescaleDB        Optimized for time queries
Document storage            MongoDB / DynamoDB            Schema flexibility
Graph relationships         Neo4j / Amazon Neptune        Traversal performance
Object storage (files)      S3 / GCS                      Unlimited, cheap
Message queue               Kafka / SQS                   Ordered, durable events
Real-time features          Redis / Cassandra             Write throughput, TTL
Geospatial queries          PostGIS / MongoDB             GeoJSON, geo-indexing
ML features                 Feast / Redis                 Low-latency feature serving

EXAMPLE: Airbnb-style platform

  User profiles, listings    → PostgreSQL (relational, ACID)
  Search                     → Elasticsearch (full-text + geo)
  Session tokens             → Redis (fast, TTL)
  Booking events             → Kafka (event stream)
  Listing photos             → S3 (object storage)
  Recommendations            → DynamoDB (key-value, fast reads)
  Analytics                  → Redshift/BigQuery (columnar OLAP)

TRADE-OFFS:
  Pro: each data type stored in its ideal engine → maximum performance
  Con: operational complexity increases with each new database
       Data consistency across databases is harder
       Engineers need expertise in multiple systems

  Rule: add a new database type only when the benefit clearly
        outweighs the operational cost.

NewSQL: Horizontal ACID

Traditional RDBMS (PostgreSQL, MySQL) scale vertically and have limited horizontal scalability. NoSQL databases scale horizontally but sacrifice ACID transactions. NewSQL systems attempt both.

NEWSQL COMPARISON:

CockroachDB:
  - PostgreSQL-compatible SQL dialect
  - Distributed ACID transactions via multi-version concurrency control
  - Geo-distributed replication with configurable data placement
  - Raft-based consensus for each range (64 MB data chunks)
  - Scales from 1 to thousands of nodes
  - Automatic sharding and rebalancing

  Architecture:
  Client → [SQL Layer] → [Transaction Layer] → [Distribution Layer]
         → [Replication Layer: Raft] → [Storage: RocksDB]

  Latency cost: distributed transactions require consensus across
  majority of replicas → minimum 1 RTT within region, 2x RTT across regions.

Google Spanner (and Cloud Spanner):
  - Globally distributed ACID transactions
  - Uses TrueTime API: atomic clocks + GPS to assign globally ordered timestamps
  - External consistency (stronger than serializable)
  - Used by Google Ads, Google Fi, YouTube

  TrueTime insight: if every transaction has a globally-ordered timestamp
  from an atomic clock, you can determine the order of all transactions
  without distributed coordination. The clock IS the coordination mechanism.

PlanetScale (Vitess-based):
  - MySQL-compatible
  - Non-blocking schema changes (no table locks on ALTER TABLE)
  - Branch-based database workflow (like Git for databases)
  - Connection pooling at the cloud layer

WHEN TO USE NEWSQL:
  - Need ACID transactions across shards
  - Global deployment required with strong consistency
  - Cannot afford manual sharding complexity
  - Can accept higher latency for writes (consensus cost)

WHEN TO STAY WITH TRADITIONAL RDBMS:
  - Single-region deployment (PostgreSQL is hard to beat)
  - Write latency is critical (PostgreSQL < CockroachDB for local txns)
  - Team does not have NewSQL operational expertise
  - Operational costs of distributed systems not yet justified by scale

Sharding Architecture Diagram

COMPLETE SHARDED DATABASE ARCHITECTURE:

             Application Servers (stateless, horizontal)
            /           |           |           \
           /            |           |            \
     [App 1]       [App 2]     [App 3]       [App 4]
          \            |           |            /
           \           |           |           /
            v          v           v          v
      +-------------------------------------------+
      |          Database Proxy Layer              |
      |   (ProxySQL / PgBouncer / Vitess VTGate)  |
      |   - Connection pooling                     |
      |   - Read/write routing                     |
      |   - Shard routing                          |
      +-------------------------------------------+
               |              |
         (writes)         (reads)
               |              |
               v              v
    +----------------+  +------------------+
    | Shard Router   |  | Read Replica Pool|
    | (hash/range)   |  | (lag: ~100ms)    |
    +----------------+  +------------------+
        /    |    \
       /     |     \
  [Shard1][Shard2][Shard3]   (each: Primary + Replica)
      ↓       ↓       ↓
  [Backup] [Backup] [Backup]  (offsite, point-in-time)

  Cross-shard operations:
  → Fanout query: send to all shards, merge results in app/proxy
  → Distributed txn: Saga or 2PC through proxy

  Monitoring per shard:
  - Replication lag
  - Connection pool utilization
  - Slow query log
  - Table sizes and growth rate

Production Example: Slack's Database Architecture

Slack started with a MySQL monolith and scaled by: 1. Vertical scaling: bigger RDS instances 2. Read replicas: offload analytics and reporting queries 3. Vitess for connection pooling and sharding by workspace_id — each Slack workspace's data is collocated on the same shard, avoiding cross-shard queries for the most common operations (messages within a workspace) 4. Schema changes without locking using Vitess Online DDL

The key insight: shard by the "tenant" (workspace) so that the most common queries (user accessing their workspace's data) are always within a single shard.

Debugging Notes

Replication lag spike: Usually caused by a large write (bulk insert, mass update, schema change) that takes seconds to replay on replicas. Monitor replica_lag metric. Alert if lag exceeds 5 seconds for read-consistency-sensitive data.

Connection pool exhaustion: too many connections errors from PostgreSQL. Check: (1) PgBouncer pool_size per database, (2) number of app server instances × threads, (3) pg_stat_activity for idle-in-transaction connections (likely an application bug holding transactions open).

Hot shard: One shard receives disproportionate traffic because the shard key is not uniformly distributed. Identify via per-shard QPS monitoring. Remediate by re-sharding with a better key or adding a composite key.

Cross-shard query performance: A query that fans out to 10 shards is at least 10x more expensive in wall time and 10x more expensive in total DB load than a single-shard query. Monitor cross-shard query count as a metric; design to minimize.

Security Implications

Shard isolation: a compromise of one shard should not compromise others. Separate shard-level credentials; do not use a single superuser across all shards.
Data residency in geographic sharding: GDPR requires EU personal data to remain in the EU. Geographic sharding enables data residency but requires strict routing enforcement. A routing bug can send EU user data to a non-EU shard.
Connection pooler as attack surface: PgBouncer and ProxySQL are in the critical path. Ensure they are patched promptly; their exposure is equivalent to the database itself.

Performance Implications

PostgreSQL connection overhead: Each PostgreSQL connection spawns a backend process. 200 connections = 200 processes = ~1-2 GB overhead before any queries. This is why PgBouncer is essential.
Replication overhead: Streaming replication adds ~5-10% overhead to primary writes (log shipping). Logical replication is higher overhead but more flexible.
Shard count sweet spot: More shards = more parallelism but more cross-shard coordination overhead. Typical starting point: 4-16 shards. Scale shard count with order-of-magnitude data growth.

Failure Modes

Over-sharding too early: Adding sharding complexity before it is needed. A single PostgreSQL instance can handle tens of millions of rows and thousands of requests per second. Profile first; shard when genuinely needed.

Non-reshardable key choice: Choosing a shard key that cannot be changed later (e.g., auto-increment primary key that is embedded in external URLs). Design shard keys that are internal identifiers.

Split brain in replica promotion: If the primary fails and automatic failover promotes a replica, but the old primary recovers and also believes it is primary, you have two primaries accepting writes. Use STONITH (Shoot The Other Node In The Head) or distributed consensus to prevent split brain.

Modern Usage

Managed database services (AWS RDS, Google Cloud SQL, Azure Database) handle replication setup automatically. AWS Aurora provides multi-master with automatic failover. PlanetScale (MySQL/Vitess) and Neon (serverless PostgreSQL) further reduce the operational burden of scaling.

Serverless databases (Aurora Serverless v2, Neon, CockroachDB Serverless) scale compute up and down automatically, decoupling storage from compute — a model increasingly adopted for variable-load workloads.

Future Directions

Disaggregated storage and compute: Aurora's model (shared storage layer, multiple compute nodes) is becoming standard. Compute scales independently of data volume.

Hardware-accelerated database operations: FPGA and GPU-accelerated query engines (NVIDIA RAPIDS cuDF) for analytical workloads, pushing more computation closer to storage.

AI-driven query optimization: PostgreSQL 16+ includes AI-assisted statistics. Future systems will automatically tune indexes, shard keys, and query plans based on observed workload.

Exercises

Design the database architecture for a multi-tenant SaaS application with 10,000 tenants, where the largest 10 tenants represent 80% of the data volume. How do you shard? What are the challenges?
A PostgreSQL primary has 300 connections from 15 application servers. You need to scale to 50 application servers. What happens without a connection pooler? Set up PgBouncer architecture to support 50 servers.
Implement a simple shard router: given a user_id (integer 0-999999), write code to determine which of 4 shards to use. Then simulate adding a 5th shard and calculate how many records must be moved.
Design a CQRS architecture for a real-time leaderboard: users earn points, the top 100 users must be queryable in <10ms. What are the write and read models?
Compare CockroachDB and a sharded MySQL (via Vitess) for a global e-commerce application. What are the latency trade-offs for transactions that span multiple geographic regions?

References

Kleppmann, M. Designing Data-Intensive Applications. O'Reilly, 2017. Chapters 5-9.
Corbett, J., et al. "Spanner: Google's Globally Distributed Database." OSDI, 2012.
Taft, R., et al. "CockroachDB: The Resilient Geo-Distributed SQL Database." SIGMOD, 2020.
Sane, A., et al. "Vitess: The Database Clustering System for Horizontal Scaling of MySQL." PlanetScale Docs, 2021.
Richardson, C. Microservices Patterns. Manning, 2018. Chapter 6: Querying in a Microservice Architecture.
PostgreSQL Documentation: Replication. https://www.postgresql.org/docs/current/high-availability.html
PgBouncer Documentation: https://www.pgbouncer.org/