Design a Rate Limiter
Related Concepts: Token Bucket Algorithm, Sliding Window Counter, Redis/In-Memory Storage, Distributed Coordination, Eventual Consistency, API Gateway Integration, Fault Tolerance, Graceful Degradation
Design a rate limiter that controls the rate of requests to protect APIs from abuse and ensure fair resource allocation.
Step 1: Requirements and Scope
Functional Requirements
- Limit requests based on user ID, IP address, or API key
- Support different rate limits for different API endpoints
- Work in a distributed environment (multiple servers)
- Return informative responses when rate limited
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Latency | < 1ms overhead | Rate limiting should not slow down requests |
| Availability | 99.99% | Failed limiter either blocks all or allows all |
| Accuracy | +/- 1% of limit | Must protect resources reliably |
| Memory | Minimal per user | Millions of users across multiple APIs |
| Fault Tolerance | Graceful degradation | System should work even if limiter fails |
Scale Estimation
- 1 million daily active users
- 10 requests/second per user peak
- 1000 requests/second average per user
- 10 million requests/second total
- 100 different API endpoints with different limits
Step 2: High-Level Design
Where to Place the Rate Limiter
| Placement | Advantages | Disadvantages | Use Case |
|---|---|---|---|
| Client-side | Reduces server load | Easily bypassed, unreliable | Trusted internal services |
| Server-side middleware | Full control, accurate | Additional hop in request path | Most applications |
| API Gateway | Managed, centralized rules | Vendor lock-in, less customizable | Microservices architecture |
| Reverse proxy (Nginx/HAProxy) | Efficient, battle-tested | Limited algorithm options | Simple use cases |
Recommendation: Server-side middleware or API Gateway for most production systems.
Step 3: Rate Limiting Algorithms
Algorithm Comparison
| Algorithm | Mechanism | Memory | Accuracy | Burst Handling | Complexity |
|---|---|---|---|---|---|
| Token Bucket | Tokens refill at fixed rate, requests consume tokens | O(1) per user | High | Allows controlled bursts | Low |
| Leaky Bucket | Requests queue and process at fixed rate | O(n) for queue | High | No bursts (smooth) | Low |
| Fixed Window | Count requests in fixed time intervals | O(1) per user | Medium | Edge case bursts | Very Low |
| Sliding Window Log | Track timestamp of every request | O(n) per user | Very High | Good | Medium |
| Sliding Window Counter | Weighted combination of windows | O(1) per user | High | Good | Low |
Algorithm 1: Token Bucket
Mechanism:
- Bucket has maximum capacity (e.g., 10 tokens)
- Tokens added at fixed rate (e.g., 1/second)
- Each request consumes one token
- Request denied if bucket empty
Use case: APIs that allow controlled bursts (AWS, Stripe)
Algorithm 2: Leaky Bucket
Mechanism:
- Requests enter a FIFO queue
- Queue processed at constant rate
- New requests dropped if queue full
Use case: When smooth, predictable request flow is needed (payment processing)
Algorithm 3: Fixed Window Counter
Problem: 3 requests at 0:58 + 3 at 1:02 = 6 requests in 4 seconds (exceeds intended rate)
Use case: Simple use cases where edge-case bursts are acceptable
Algorithm 4: Sliding Window Counter (Recommended)
Mechanism:
- Maintain counters for current and previous windows
- Weight previous window by overlap percentage
- Effective count = (prev_count × prev_weight) + curr_count
Use case: Most production systems - good balance of accuracy and efficiency
Step 4: Distributed Rate Limiting
Challenge: Multiple Servers
Synchronization Strategies
| Strategy | Mechanism | Advantages | Disadvantages |
|---|---|---|---|
| Centralized Redis | All servers check single Redis cluster | Accurate, consistent | Redis is single point of failure |
| Sticky Sessions | Route user to same server | No distributed state | Uneven load, session loss issues |
| Gossip Protocol | Servers periodically share counts | No central dependency | Eventually consistent, complex |
| Hybrid | Local counters + periodic sync | Low latency, good accuracy | Some over-counting possible |
Race Condition Prevention
Without atomic operations, concurrent requests can exceed limits:
Solution: Use atomic operations (Redis Lua scripts or INCR with conditional logic)
Step 5: Response Design
Rate Limit Headers
| Header | Purpose | Example |
|---|---|---|
X-RateLimit-Limit | Maximum requests allowed | 100 |
X-RateLimit-Remaining | Requests remaining in window | 23 |
X-RateLimit-Reset | Unix timestamp when limit resets | 1640000000 |
Retry-After | Seconds to wait before retrying | 60 |
HTTP Response Codes
| Code | When to Use |
|---|---|
429 Too Many Requests | Rate limit exceeded |
503 Service Unavailable | Rate limiter itself is overloaded |
Step 6: Multi-Tier Rate Limiting
Production systems often need multiple rate limits:
Example Multi-Tier Configuration
| API Tier | Per Second | Per Minute | Per Day |
|---|---|---|---|
| Free | 1 | 20 | 1,000 |
| Basic | 10 | 500 | 50,000 |
| Pro | 50 | 2,000 | 500,000 |
| Enterprise | 200 | 10,000 | Unlimited |
Step 7: Failure Handling
What Happens When Redis Fails?
| Strategy | Behavior | When to Use |
|---|---|---|
| Fail Open | Allow all requests | User-facing APIs where availability > protection |
| Fail Closed | Block all requests | Security-critical APIs |
| Local Fallback | Use in-memory cache | Graceful degradation |
| Circuit Breaker | Skip rate limiting temporarily | High availability systems |
Redis High Availability
Production Examples
| Company | Algorithm | Storage | Notable Features |
|---|---|---|---|
| Stripe | Token Bucket | Redis | Cascading limits (sec/min/hour) |
| GitHub | Sliding Window | Redis | Different limits by auth type |
| Cloudflare | Leaky Bucket | Distributed | Edge-based, geo-aware |
| AWS API Gateway | Token Bucket | DynamoDB | Managed, auto-scaling |
| Kong | Multiple | Redis/Cassandra | Plugin-based, configurable |
Summary: Key Design Decisions
| Decision | Options | Recommendation |
|---|---|---|
| Algorithm | Token bucket, Sliding window, etc. | Sliding window counter for most cases |
| Storage | Local, Redis, Database | Redis cluster for distributed systems |
| Placement | Client, Server, Gateway | API Gateway or server middleware |
| Failure mode | Fail open vs closed | Depends on API criticality |
| Granularity | User, IP, API key | Multiple levels for defense in depth |