Scaling Fundamentals

This guide covers the building blocks used in system design for scaling applications.

Scaling Stages

Systems evolve through stages, each introducing new challenges and solutions.

Stage 1: Single Server

All components run on one machine: web server, application, and database.

Loading diagram...

Scale: Prototypes, small projects, < 1000 users

Bottleneck: CPU, memory, or disk saturation

Stage 2: Separate Database

Database moves to its own server.

Loading diagram...

Rationale: Different resource profiles. Web servers are CPU-bound; databases are I/O-bound.

Benefit: Each tier scales independently

Stage 3: Add Caching

In-memory cache reduces database load.

Loading diagram...

Cache-aside pattern:

Check cache first
On miss, query database
Store result in cache
Return to user

Cache appropriate for:

Data read frequently, written infrequently
Stale data acceptable for short periods
Expensive computations

Stage 4: Multiple Web Servers

Load balancing handles more concurrent users.

Loading diagram...

Load balancing strategies:

Round robin: Distribute requests evenly
Least connections: Send to least busy server
IP hash: Same user hits same server (session affinity)

Requirement: Sessions must be stateless or externalized (Redis, database)

Stage 5: Database Replication

Separate read and write traffic.

Loading diagram...

Benefits:

Higher read throughput
Redundancy for failover
Geographic distribution

Trade-off: Replication lag means reads may be slightly stale

Stage 6: Content Delivery Network

Static content served from edge locations close to users.

Loading diagram...

CDN content:

Images, videos, CSS, JavaScript
Any static content
Sometimes API responses

Benefits:

Reduced latency (geographic proximity)
Lower origin server load
DDoS protection

Database Scaling Strategies

Vertical Scaling (Scale Up)

Add more resources to existing machine: CPU, RAM, storage.

Advantages: Simple, no code changes Disadvantages: Hardware limits, expensive, single point of failure

Horizontal Scaling (Scale Out)

Add more machines to distribute load.

Read Replicas

Route read queries to replicas, writes to primary. The application code directs SELECT operations to any read replica for load distribution, while UPDATE, INSERT, and DELETE operations always go to the primary database to maintain consistency.

Sharding

Split data across multiple databases.

Sharding strategies:

Range-based: Users A-M on shard 1, N-Z on shard 2
- Simple to implement
- Risk of uneven distribution (hot spots)
Hash-based: shard = hash(user_id) % num_shards
- Even distribution
- Difficult to add/remove shards
Directory-based: Lookup table maps keys to shards
- Flexible
- Lookup service becomes bottleneck

Sharding challenges:

Cross-shard queries are expensive
Joins become difficult
Transactions span multiple databases
Resharding is complex

Caching Strategies

Cache Locations

Loading diagram...

Cache Patterns

Cache-aside (Lazy loading):

Application manages cache
Read: check cache, miss -> fetch from DB -> populate cache
Write: update DB, invalidate cache

Write-through:

Write to cache and DB simultaneously
Data always consistent
Higher write latency

Write-behind (Write-back):

Write to cache, async write to DB
Fast writes
Risk of data loss

Cache Eviction

When cache is full, entries are removed based on policy:

LRU (Least Recently Used): Remove oldest accessed item
LFU (Least Frequently Used): Remove least accessed item
TTL (Time To Live): Expire after set duration

Cache Invalidation

Strategies:

TTL: Accept eventual consistency
Event-driven: Invalidate on writes
Version-based: Include version in cache key

Stateless Architecture

For horizontal scaling, servers must be stateless.

Stateful (problematic): Session stored on server

Loading diagram...

Stateless (correct): Session stored externally

Loading diagram...

Externalize:

User sessions
Shopping carts
Any user-specific state

Reference Numbers

Operation	Latency
L1 cache reference	1 ns
L2 cache reference	4 ns
RAM reference	100 ns
SSD read	100 us
HDD seek	10 ms
Network round trip (same datacenter)	500 us
Network round trip (cross-country)	150 ms

Throughput	Value
SSD sequential read	500 MB/s
HDD sequential read	100 MB/s
1 Gbps network	125 MB/s
Typical web server	1000-10000 req/s

Scaling Stages​

Stage 1: Single Server​

Stage 2: Separate Database​

Stage 3: Add Caching​

Stage 4: Multiple Web Servers​

Stage 5: Database Replication​

Stage 6: Content Delivery Network​

Database Scaling Strategies​

Vertical Scaling (Scale Up)​

Horizontal Scaling (Scale Out)​

Read Replicas​

Sharding​

Caching Strategies​

Cache Locations​

Cache Patterns​

Cache Eviction​

Cache Invalidation​

Stateless Architecture​

Reference Numbers​

Table of Contents

Scaling Stages

Stage 1: Single Server

Stage 2: Separate Database

Stage 3: Add Caching

Stage 4: Multiple Web Servers

Stage 5: Database Replication

Stage 6: Content Delivery Network

Database Scaling Strategies

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Read Replicas

Sharding

Caching Strategies

Cache Locations

Cache Patterns

Cache Eviction

Cache Invalidation

Stateless Architecture

Reference Numbers