Scaling Databases: When Sharding, Replication, and Caching Become Necessary
Practical guide to database scaling decisions. Real benchmarks and trade-offs from scaling systems from 1K to 1B queries per day.
Core: Most scaling discussions happen at the wrong scale. Teams implement sharding at 100GB when they should be at 10TB. They add caching layer when the problem is query inefficiency. Scaling decisions are about understanding your bottleneck.
The Replication Story: When One Database Becomes 10
Detail: We built our first system with a single PostgreSQL database. For three years, it handled all traffic. At 2K queries per second, disk space was 200GB, response time averaged 8ms. Then we hit a wall. Each new feature requiring a complex JOIN query would degrade performance by 15%. We’d added 15 features that year, each slightly slower than the before.
The moment of truth: we could either accept 2-3ms feature velocity decrease per new complex query, or architect for scale we hadn’t reached. We chose replication. We set up a read replica, moved all reads there, kept writes on the primary. Response time immediately dropped to 3ms.
Then we needed more reads. We added a second replica, then a third. At six replicas, writes started lagging behind reads. Replication lag meant users would see stale data, leading to race conditions. Fixing that required application-level logic to route writes directly to the primary and handle stale reads gracefully.
By the time we had 12 read replicas, maintaining replication was consuming engineering time. Every backup required careful coordination (can’t backup while replicating). Promoting a replica to primary required testing. We’d solved the read scaling problem but created an operational complexity problem.
| |
The configuration shows the complexity: PostgreSQL replication requires careful tuning of WAL settings, monitoring of replication slots, and application awareness of replica lag. At 100 queries per second, this is unnecessary. At 10,000 queries per second, it becomes mandatory.
Application: Replication becomes worth considering when: (1) you have more read load than write load (typical 80/20 split), (2) write latency is acceptable (replicas lag behind primary), (3) you can handle eventual consistency. If your system requires strong consistency (financial transactions), replication doesn’t help. If your read load is growing faster than write optimization can handle, replication buys you time.
The Sharding Conversation: When One Database Becomes Many
Core: Sharding is the most complex scaling decision. It’s permanent—you’ll live with this decision for years. Consider it carefully.
Detail: At 1TB of data with 20K queries per second, a single database became a bottleneck. Disk I/O was maxed, CPU was pegged, and we couldn’t optimize queries further. Replication and caching had exhausted their returns. Sharding became inevitable.
We chose hash-based sharding on user_id. Each shard held 1/8 of users. User 1234567 → shard 3. This allowed us to split traffic across 8 database instances and linearly scale capacity. Eight database instances meant 8x the storage, 8x the write capacity, 8x the operational overhead.
But the cost was immense. Every query now needed routing logic. A query like “show me the top 100 users by signup date” required querying all 8 shards, sorting the results, and returning the top 100. Application code became complex with shard-awareness logic.
Resharding was a nightmare. After two years, we needed a ninth shard. Rebalancing data across nine shards meant moving 11% of all user data from existing shards—weeks of background jobs and risky migrations.
| |
Sharding multiplied operational complexity. Before sharding, a backup was straightforward—backup one database. After sharding, backups had to be coordinated across eight databases, and recovery meant knowing which shard failed. Monitoring became difficult—tracking latency required aggregating metrics from eight data stores.
Application: Sharding is valuable when: (1) you’ve maximized single-database performance, (2) you understand your access patterns (most queries are shard-aware), (3) you can live with operational complexity. If you can solve your problem with replication, caching, or better queries, avoid sharding. Resharding is so expensive that you want to get your shard count right the first time—and you almost never do.
The Caching Layer: When 100ms Queries Become 2ms
Core: Caching is often the most misunderstood scaling tool. It’s not about speed—it’s about load distribution. Good caching takes load off the database, not just users.
Detail: Our user profile service was returning 400ms responses because every request hit the database with a complex JOIN. Adding Redis caching dropped it to 20ms on cache hits. Perfect, we thought.
Six months later, we faced the cache invalidation problem. Users updated their profiles, but the cache had the old version for 5 minutes. Features depending on up-to-date profiles had race conditions. Cache invalidation became harder than the original optimization.
The real lesson came from looking at the database. Adding caching didn’t improve database load—it just delayed it. Cache misses still hit the database just as hard. What we actually needed was better indexing and query optimization. The cache masked the problem instead of solving it.
Real caching value came from computationally expensive operations. We had a “user feed” that involved computing recommendations by comparing the user to 10M items. Computing it on-demand took 3 seconds. Computing it every hour and caching it took 50ms to retrieve. Now caching wasn’t about hiding database load—it was about replacing expensive computation with fast retrieval.
| |
The strategy: cache expensive computations aggressively (long TTL), cache simple lookups conservatively (short TTL or invalidate on write), don’t cache things that rarely repeat.
Application: Cache is most valuable for read-heavy workloads with expensive computations. It’s least valuable for write-heavy workloads or simple database queries. If you’re caching to hide slow queries, fix the queries first—then cache to handle remaining load spikes.
The Real Bottleneck: Query Optimization Before Scaling
Core: Most scaling problems are actually query problems.
Before we sharded, we found that 40% of traffic came from 5 queries. Poorly written queries. Adding the right indexes dropped response time by 60%. That one day of optimization work was worth more than our entire replication and caching strategy.
Premature scaling is expensive. Mature scaling is valuable. Understand your bottleneck before you scale.
Hero Image Prompt: “Database architecture evolution showing progression: single database → replicated databases → sharded databases with caching layers. Show query flow with timing metrics (20ms, 3ms, etc). Include capacity graphs showing linear scaling with sharding. Technical, clean visualization with network connections between components. Dark theme with cyan (#16213e) accent lines showing query routing.”