Scalability
Scalability refers to a system's ability to handle increased load by adding resources. This section covers scaling approaches, patterns, metrics, and trade-offs.
Types of Scaling
Vertical Scaling (Scale Up)
Vertical scaling increases capacity by upgrading to more powerful hardware (more CPU, RAM, faster storage).
| Aspect | Description |
|---|---|
| Advantages | No code changes required, no distributed system complexity |
| Disadvantages | Hardware limits, single point of failure, non-linear cost increase |
Vertical scaling is the simplest approach and appropriate for initial scaling needs.
Horizontal Scaling (Scale Out)
Horizontal scaling increases capacity by adding more machines and distributing load across them.
| Aspect | Description |
|---|---|
| Advantages | No theoretical ceiling, built-in redundancy, commodity hardware |
| Disadvantages | Distributed system complexity, load balancing requirements, data consistency challenges |
Horizontal scaling is required for large-scale systems but introduces significant complexity.
Scalability Patterns
1. Load Balancing
Load balancers distribute requests across multiple servers.
Load balancing algorithms:
- Round Robin
- Least Connections
- IP Hash
- Weighted Round Robin
2. Database Scaling
Read Replicas:
Read replicas distribute read traffic while a primary database handles writes. Most applications have higher read than write volume.
Consideration: Replicas may have replication lag. A write followed by an immediate read may not return the written data if the read hits a replica.
Sharding:
Sharding distributes data across multiple databases based on a shard key. User IDs 1-1M on database 1, 1M-2M on database 2.
Considerations:
- JOINs across shards are expensive or impossible
- Shard key selection significantly impacts query patterns
- Rebalancing shards requires data migration
3. Caching
Caching stores frequently accessed data in memory to reduce database load.
Cache locations:
| Layer | Example |
|---|---|
| Browser | HTTP cache headers |
| CDN | Static assets |
| Application cache | Redis, Memcached |
| Database | Query result caching |
A well-configured cache can reduce database load by 90% or more.
4. Asynchronous Processing
Asynchronous processing defers non-critical work to background jobs.
Examples: sending emails, processing images, generating reports.
Scalability Metrics
| Metric | Description |
|---|---|
| Throughput | Requests per second the system can handle |
| Latency | Response time; p99 latency indicates the experience of the slowest 1% of requests |
| Availability | Percentage of time the system is operational |
| Resource utilization | CPU, memory, disk usage; high utilization indicates limited headroom |
Trade-offs
| Approach | Complexity | Cost | Performance Ceiling |
|---|---|---|---|
| Vertical scaling | Low | High | Limited by hardware |
| Horizontal scaling | High | Variable | Unlimited |
| Caching | Medium | Low | High impact on reads |
| Async processing | Medium | Low | High impact on throughput |
Design Guidelines
| Guideline | Description |
|---|---|
| Design for horizontal scaling | Avoid patterns that prevent horizontal scaling (server-local session state) |
| Keep services stateless | Stateless services allow any server to handle any request |
| Cache aggressively | The fastest database query is one that does not execute |
| Measure before optimizing | Profile to identify actual bottlenecks |
| Plan for failure | At scale, component failures are continuous; design for resilience |