Skip to main content

Scaling Fundamentals

This guide covers the building blocks used in system design for scaling applications.

Scaling Stages

Systems evolve through stages, each introducing new challenges and solutions.

Stage 1: Single Server

All components run on one machine: web server, application, and database.

Loading diagram...

Scale: Prototypes, small projects, < 1000 users

Bottleneck: CPU, memory, or disk saturation

Stage 2: Separate Database

Database moves to its own server.

Loading diagram...

Rationale: Different resource profiles. Web servers are CPU-bound; databases are I/O-bound.

Benefit: Each tier scales independently

Stage 3: Add Caching

In-memory cache reduces database load.

Loading diagram...

Cache-aside pattern:

  1. Check cache first
  2. On miss, query database
  3. Store result in cache
  4. Return to user

Cache appropriate for:

  • Data read frequently, written infrequently
  • Stale data acceptable for short periods
  • Expensive computations

Stage 4: Multiple Web Servers

Load balancing handles more concurrent users.

Loading diagram...

Load balancing strategies:

  • Round robin: Distribute requests evenly
  • Least connections: Send to least busy server
  • IP hash: Same user hits same server (session affinity)

Requirement: Sessions must be stateless or externalized (Redis, database)

Stage 5: Database Replication

Separate read and write traffic.

Loading diagram...

Benefits:

  • Higher read throughput
  • Redundancy for failover
  • Geographic distribution

Trade-off: Replication lag means reads may be slightly stale

Stage 6: Content Delivery Network

Static content served from edge locations close to users.

Loading diagram...

CDN content:

  • Images, videos, CSS, JavaScript
  • Any static content
  • Sometimes API responses

Benefits:

  • Reduced latency (geographic proximity)
  • Lower origin server load
  • DDoS protection

Database Scaling Strategies

Vertical Scaling (Scale Up)

Add more resources to existing machine: CPU, RAM, storage.

Advantages: Simple, no code changes Disadvantages: Hardware limits, expensive, single point of failure

Horizontal Scaling (Scale Out)

Add more machines to distribute load.

Read Replicas

Route read queries to replicas, writes to primary. The application code directs SELECT operations to any read replica for load distribution, while UPDATE, INSERT, and DELETE operations always go to the primary database to maintain consistency.

Sharding

Split data across multiple databases.

Sharding strategies:

  1. Range-based: Users A-M on shard 1, N-Z on shard 2

    • Simple to implement
    • Risk of uneven distribution (hot spots)
  2. Hash-based: shard = hash(user_id) % num_shards

    • Even distribution
    • Difficult to add/remove shards
  3. Directory-based: Lookup table maps keys to shards

    • Flexible
    • Lookup service becomes bottleneck

Sharding challenges:

  • Cross-shard queries are expensive
  • Joins become difficult
  • Transactions span multiple databases
  • Resharding is complex

Caching Strategies

Cache Locations

Loading diagram...

Cache Patterns

Cache-aside (Lazy loading):

  • Application manages cache
  • Read: check cache, miss -> fetch from DB -> populate cache
  • Write: update DB, invalidate cache

Write-through:

  • Write to cache and DB simultaneously
  • Data always consistent
  • Higher write latency

Write-behind (Write-back):

  • Write to cache, async write to DB
  • Fast writes
  • Risk of data loss

Cache Eviction

When cache is full, entries are removed based on policy:

  • LRU (Least Recently Used): Remove oldest accessed item
  • LFU (Least Frequently Used): Remove least accessed item
  • TTL (Time To Live): Expire after set duration

Cache Invalidation

Strategies:

  • TTL: Accept eventual consistency
  • Event-driven: Invalidate on writes
  • Version-based: Include version in cache key

Stateless Architecture

For horizontal scaling, servers must be stateless.

Stateful (problematic): Session stored on server

Loading diagram...

Stateless (correct): Session stored externally

Loading diagram...

Externalize:

  • User sessions
  • Shopping carts
  • Any user-specific state

Reference Numbers

OperationLatency
L1 cache reference1 ns
L2 cache reference4 ns
RAM reference100 ns
SSD read100 us
HDD seek10 ms
Network round trip (same datacenter)500 us
Network round trip (cross-country)150 ms
ThroughputValue
SSD sequential read500 MB/s
HDD sequential read100 MB/s
1 Gbps network125 MB/s
Typical web server1000-10000 req/s