Skip to main content

Design a Rate Limiter

Related Concepts: Token Bucket Algorithm, Sliding Window Counter, Redis/In-Memory Storage, Distributed Coordination, Eventual Consistency, API Gateway Integration, Fault Tolerance, Graceful Degradation

Design a rate limiter that controls the rate of requests to protect APIs from abuse and ensure fair resource allocation.

Step 1: Requirements and Scope

Functional Requirements

  • Limit requests based on user ID, IP address, or API key
  • Support different rate limits for different API endpoints
  • Work in a distributed environment (multiple servers)
  • Return informative responses when rate limited

Non-Functional Requirements

RequirementTargetRationale
Latency< 1ms overheadRate limiting should not slow down requests
Availability99.99%Failed limiter either blocks all or allows all
Accuracy+/- 1% of limitMust protect resources reliably
MemoryMinimal per userMillions of users across multiple APIs
Fault ToleranceGraceful degradationSystem should work even if limiter fails

Scale Estimation

  • 1 million daily active users
  • 10 requests/second per user peak
  • 1000 requests/second average per user
  • 10 million requests/second total
  • 100 different API endpoints with different limits

Step 2: High-Level Design

Loading diagram...

Where to Place the Rate Limiter

PlacementAdvantagesDisadvantagesUse Case
Client-sideReduces server loadEasily bypassed, unreliableTrusted internal services
Server-side middlewareFull control, accurateAdditional hop in request pathMost applications
API GatewayManaged, centralized rulesVendor lock-in, less customizableMicroservices architecture
Reverse proxy (Nginx/HAProxy)Efficient, battle-testedLimited algorithm optionsSimple use cases

Recommendation: Server-side middleware or API Gateway for most production systems.

Step 3: Rate Limiting Algorithms

Algorithm Comparison

AlgorithmMechanismMemoryAccuracyBurst HandlingComplexity
Token BucketTokens refill at fixed rate, requests consume tokensO(1) per userHighAllows controlled burstsLow
Leaky BucketRequests queue and process at fixed rateO(n) for queueHighNo bursts (smooth)Low
Fixed WindowCount requests in fixed time intervalsO(1) per userMediumEdge case burstsVery Low
Sliding Window LogTrack timestamp of every requestO(n) per userVery HighGoodMedium
Sliding Window CounterWeighted combination of windowsO(1) per userHighGoodLow

Algorithm 1: Token Bucket

Loading diagram...

Mechanism:

  1. Bucket has maximum capacity (e.g., 10 tokens)
  2. Tokens added at fixed rate (e.g., 1/second)
  3. Each request consumes one token
  4. Request denied if bucket empty

Use case: APIs that allow controlled bursts (AWS, Stripe)

Algorithm 2: Leaky Bucket

Loading diagram...

Mechanism:

  1. Requests enter a FIFO queue
  2. Queue processed at constant rate
  3. New requests dropped if queue full

Use case: When smooth, predictable request flow is needed (payment processing)

Algorithm 3: Fixed Window Counter

Loading diagram...

Problem: 3 requests at 0:58 + 3 at 1:02 = 6 requests in 4 seconds (exceeds intended rate)

Use case: Simple use cases where edge-case bursts are acceptable

Loading diagram...

Mechanism:

  1. Maintain counters for current and previous windows
  2. Weight previous window by overlap percentage
  3. Effective count = (prev_count × prev_weight) + curr_count

Use case: Most production systems - good balance of accuracy and efficiency

Step 4: Distributed Rate Limiting

Challenge: Multiple Servers

Loading diagram...

Synchronization Strategies

StrategyMechanismAdvantagesDisadvantages
Centralized RedisAll servers check single Redis clusterAccurate, consistentRedis is single point of failure
Sticky SessionsRoute user to same serverNo distributed stateUneven load, session loss issues
Gossip ProtocolServers periodically share countsNo central dependencyEventually consistent, complex
HybridLocal counters + periodic syncLow latency, good accuracySome over-counting possible

Race Condition Prevention

Without atomic operations, concurrent requests can exceed limits:

Loading diagram...

Solution: Use atomic operations (Redis Lua scripts or INCR with conditional logic)

Step 5: Response Design

Rate Limit Headers

HeaderPurposeExample
X-RateLimit-LimitMaximum requests allowed100
X-RateLimit-RemainingRequests remaining in window23
X-RateLimit-ResetUnix timestamp when limit resets1640000000
Retry-AfterSeconds to wait before retrying60

HTTP Response Codes

CodeWhen to Use
429 Too Many RequestsRate limit exceeded
503 Service UnavailableRate limiter itself is overloaded

Step 6: Multi-Tier Rate Limiting

Production systems often need multiple rate limits:

Loading diagram...

Example Multi-Tier Configuration

API TierPer SecondPer MinutePer Day
Free1201,000
Basic1050050,000
Pro502,000500,000
Enterprise20010,000Unlimited

Step 7: Failure Handling

What Happens When Redis Fails?

StrategyBehaviorWhen to Use
Fail OpenAllow all requestsUser-facing APIs where availability > protection
Fail ClosedBlock all requestsSecurity-critical APIs
Local FallbackUse in-memory cacheGraceful degradation
Circuit BreakerSkip rate limiting temporarilyHigh availability systems

Redis High Availability

Loading diagram...

Production Examples

CompanyAlgorithmStorageNotable Features
StripeToken BucketRedisCascading limits (sec/min/hour)
GitHubSliding WindowRedisDifferent limits by auth type
CloudflareLeaky BucketDistributedEdge-based, geo-aware
AWS API GatewayToken BucketDynamoDBManaged, auto-scaling
KongMultipleRedis/CassandraPlugin-based, configurable

Summary: Key Design Decisions

DecisionOptionsRecommendation
AlgorithmToken bucket, Sliding window, etc.Sliding window counter for most cases
StorageLocal, Redis, DatabaseRedis cluster for distributed systems
PlacementClient, Server, GatewayAPI Gateway or server middleware
Failure modeFail open vs closedDepends on API criticality
GranularityUser, IP, API keyMultiple levels for defense in depth