Scaling Fundamentals
This guide covers the building blocks used in system design for scaling applications.
Scaling Stages
Systems evolve through stages, each introducing new challenges and solutions.
Stage 1: Single Server
All components run on one machine: web server, application, and database.
Scale: Prototypes, small projects, < 1000 users
Bottleneck: CPU, memory, or disk saturation
Stage 2: Separate Database
Database moves to its own server.
Rationale: Different resource profiles. Web servers are CPU-bound; databases are I/O-bound.
Benefit: Each tier scales independently
Stage 3: Add Caching
In-memory cache reduces database load.
Cache-aside pattern:
- Check cache first
- On miss, query database
- Store result in cache
- Return to user
Cache appropriate for:
- Data read frequently, written infrequently
- Stale data acceptable for short periods
- Expensive computations
Stage 4: Multiple Web Servers
Load balancing handles more concurrent users.
Load balancing strategies:
- Round robin: Distribute requests evenly
- Least connections: Send to least busy server
- IP hash: Same user hits same server (session affinity)
Requirement: Sessions must be stateless or externalized (Redis, database)
Stage 5: Database Replication
Separate read and write traffic.
Benefits:
- Higher read throughput
- Redundancy for failover
- Geographic distribution
Trade-off: Replication lag means reads may be slightly stale
Stage 6: Content Delivery Network
Static content served from edge locations close to users.
CDN content:
- Images, videos, CSS, JavaScript
- Any static content
- Sometimes API responses
Benefits:
- Reduced latency (geographic proximity)
- Lower origin server load
- DDoS protection
Database Scaling Strategies
Vertical Scaling (Scale Up)
Add more resources to existing machine: CPU, RAM, storage.
Advantages: Simple, no code changes Disadvantages: Hardware limits, expensive, single point of failure
Horizontal Scaling (Scale Out)
Add more machines to distribute load.
Read Replicas
Route read queries to replicas, writes to primary. The application code directs SELECT operations to any read replica for load distribution, while UPDATE, INSERT, and DELETE operations always go to the primary database to maintain consistency.
Sharding
Split data across multiple databases.
Sharding strategies:
-
Range-based: Users A-M on shard 1, N-Z on shard 2
- Simple to implement
- Risk of uneven distribution (hot spots)
-
Hash-based:
shard = hash(user_id) % num_shards- Even distribution
- Difficult to add/remove shards
-
Directory-based: Lookup table maps keys to shards
- Flexible
- Lookup service becomes bottleneck
Sharding challenges:
- Cross-shard queries are expensive
- Joins become difficult
- Transactions span multiple databases
- Resharding is complex
Caching Strategies
Cache Locations
Cache Patterns
Cache-aside (Lazy loading):
- Application manages cache
- Read: check cache, miss -> fetch from DB -> populate cache
- Write: update DB, invalidate cache
Write-through:
- Write to cache and DB simultaneously
- Data always consistent
- Higher write latency
Write-behind (Write-back):
- Write to cache, async write to DB
- Fast writes
- Risk of data loss
Cache Eviction
When cache is full, entries are removed based on policy:
- LRU (Least Recently Used): Remove oldest accessed item
- LFU (Least Frequently Used): Remove least accessed item
- TTL (Time To Live): Expire after set duration
Cache Invalidation
Strategies:
- TTL: Accept eventual consistency
- Event-driven: Invalidate on writes
- Version-based: Include version in cache key
Stateless Architecture
For horizontal scaling, servers must be stateless.
Stateful (problematic): Session stored on server
Stateless (correct): Session stored externally
Externalize:
- User sessions
- Shopping carts
- Any user-specific state
Reference Numbers
| Operation | Latency |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| RAM reference | 100 ns |
| SSD read | 100 us |
| HDD seek | 10 ms |
| Network round trip (same datacenter) | 500 us |
| Network round trip (cross-country) | 150 ms |
| Throughput | Value |
|---|---|
| SSD sequential read | 500 MB/s |
| HDD sequential read | 100 MB/s |
| 1 Gbps network | 125 MB/s |
| Typical web server | 1000-10000 req/s |