Skip to main content

Scalability

Scalability refers to a system's ability to handle increased load by adding resources. This section covers scaling approaches, patterns, metrics, and trade-offs.

Types of Scaling

Vertical Scaling (Scale Up)

Vertical scaling increases capacity by upgrading to more powerful hardware (more CPU, RAM, faster storage).

AspectDescription
AdvantagesNo code changes required, no distributed system complexity
DisadvantagesHardware limits, single point of failure, non-linear cost increase

Vertical scaling is the simplest approach and appropriate for initial scaling needs.

Horizontal Scaling (Scale Out)

Horizontal scaling increases capacity by adding more machines and distributing load across them.

AspectDescription
AdvantagesNo theoretical ceiling, built-in redundancy, commodity hardware
DisadvantagesDistributed system complexity, load balancing requirements, data consistency challenges

Horizontal scaling is required for large-scale systems but introduces significant complexity.

Scalability Patterns

1. Load Balancing

Load balancers distribute requests across multiple servers.

Loading diagram...

Load balancing algorithms:

  • Round Robin
  • Least Connections
  • IP Hash
  • Weighted Round Robin

2. Database Scaling

Read Replicas:

Read replicas distribute read traffic while a primary database handles writes. Most applications have higher read than write volume.

Consideration: Replicas may have replication lag. A write followed by an immediate read may not return the written data if the read hits a replica.

Sharding:

Sharding distributes data across multiple databases based on a shard key. User IDs 1-1M on database 1, 1M-2M on database 2.

Considerations:

  • JOINs across shards are expensive or impossible
  • Shard key selection significantly impacts query patterns
  • Rebalancing shards requires data migration

3. Caching

Caching stores frequently accessed data in memory to reduce database load.

Cache locations:

LayerExample
BrowserHTTP cache headers
CDNStatic assets
Application cacheRedis, Memcached
DatabaseQuery result caching

A well-configured cache can reduce database load by 90% or more.

4. Asynchronous Processing

Asynchronous processing defers non-critical work to background jobs.

Loading diagram...

Examples: sending emails, processing images, generating reports.

Scalability Metrics

MetricDescription
ThroughputRequests per second the system can handle
LatencyResponse time; p99 latency indicates the experience of the slowest 1% of requests
AvailabilityPercentage of time the system is operational
Resource utilizationCPU, memory, disk usage; high utilization indicates limited headroom

Trade-offs

ApproachComplexityCostPerformance Ceiling
Vertical scalingLowHighLimited by hardware
Horizontal scalingHighVariableUnlimited
CachingMediumLowHigh impact on reads
Async processingMediumLowHigh impact on throughput

Design Guidelines

GuidelineDescription
Design for horizontal scalingAvoid patterns that prevent horizontal scaling (server-local session state)
Keep services statelessStateless services allow any server to handle any request
Cache aggressivelyThe fastest database query is one that does not execute
Measure before optimizingProfile to identify actual bottlenecks
Plan for failureAt scale, component failures are continuous; design for resilience