Scalability

Scalability refers to a system's ability to handle increased load by adding resources. This section covers scaling approaches, patterns, metrics, and trade-offs.

Types of Scaling

Vertical Scaling (Scale Up)

Vertical scaling increases capacity by upgrading to more powerful hardware (more CPU, RAM, faster storage).

Aspect	Description
Advantages	No code changes required, no distributed system complexity
Disadvantages	Hardware limits, single point of failure, non-linear cost increase

Vertical scaling is the simplest approach and appropriate for initial scaling needs.

Horizontal Scaling (Scale Out)

Horizontal scaling increases capacity by adding more machines and distributing load across them.

Aspect	Description
Advantages	No theoretical ceiling, built-in redundancy, commodity hardware
Disadvantages	Distributed system complexity, load balancing requirements, data consistency challenges

Horizontal scaling is required for large-scale systems but introduces significant complexity.

Scalability Patterns

1. Load Balancing

Load balancers distribute requests across multiple servers.

Loading diagram...

Load balancing algorithms:

Round Robin
Least Connections
IP Hash
Weighted Round Robin

2. Database Scaling

Read Replicas:

Read replicas distribute read traffic while a primary database handles writes. Most applications have higher read than write volume.

Consideration: Replicas may have replication lag. A write followed by an immediate read may not return the written data if the read hits a replica.

Sharding:

Sharding distributes data across multiple databases based on a shard key. User IDs 1-1M on database 1, 1M-2M on database 2.

Considerations:

JOINs across shards are expensive or impossible
Shard key selection significantly impacts query patterns
Rebalancing shards requires data migration

3. Caching

Caching stores frequently accessed data in memory to reduce database load.

Cache locations:

Layer	Example
Browser	HTTP cache headers
CDN	Static assets
Application cache	Redis, Memcached
Database	Query result caching

A well-configured cache can reduce database load by 90% or more.

4. Asynchronous Processing

Asynchronous processing defers non-critical work to background jobs.

Loading diagram...

Examples: sending emails, processing images, generating reports.

Scalability Metrics

Metric	Description
Throughput	Requests per second the system can handle
Latency	Response time; p99 latency indicates the experience of the slowest 1% of requests
Availability	Percentage of time the system is operational
Resource utilization	CPU, memory, disk usage; high utilization indicates limited headroom

Trade-offs

Approach	Complexity	Cost	Performance Ceiling
Vertical scaling	Low	High	Limited by hardware
Horizontal scaling	High	Variable	Unlimited
Caching	Medium	Low	High impact on reads
Async processing	Medium	Low	High impact on throughput

Design Guidelines

Guideline	Description
Design for horizontal scaling	Avoid patterns that prevent horizontal scaling (server-local session state)
Keep services stateless	Stateless services allow any server to handle any request
Cache aggressively	The fastest database query is one that does not execute
Measure before optimizing	Profile to identify actual bottlenecks
Plan for failure	At scale, component failures are continuous; design for resilience

Types of Scaling​

Vertical Scaling (Scale Up)​

Horizontal Scaling (Scale Out)​

Scalability Patterns​

1. Load Balancing​

2. Database Scaling​

3. Caching​

4. Asynchronous Processing​

Scalability Metrics​

Trade-offs​

Design Guidelines​

In this article