Design a URL Shortener

Design a URL shortening service that converts long URLs into short, shareable links and redirects users to the original URLs.

Related Concepts: Base62 Encoding, Database Sharding, Caching (Redis/Memcached), Load Balancing, Unique ID Generation, CAP Theorem, Read-Heavy Optimization, TTL/Expiration

Step 1: Requirements and Scope

Functional Requirements

Given a long URL, generate a unique short URL
Given a short URL, redirect to the original long URL
Support custom short aliases (optional)
Support URL expiration (optional)
Track click analytics (optional)

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99%	Broken links = lost users and revenue
Latency	< 100ms redirect	Users expect instant redirects
Durability	Never lose URLs	Links shared widely must always work
Scalability	Billions of URLs	Service grows over time
Security	Non-guessable URLs	Prevent enumeration attacks

Scale Estimation

100 million new URLs per month (~40 URLs/second)
10:1 read to write ratio -> 1 billion redirects/month (~400 redirects/second)
Average URL length: 200 bytes
Storage for 5 years: 100M x 12 x 5 x 200 bytes = 1.2 TB (excluding metadata)

How Many Characters for Short Code?

Characters	Base62 Combinations	Years to Exhaust (at 100M/month)
6	56.8 billion	47 years
7	3.5 trillion	2,900 years
8	218 trillion	181,000 years

Recommendation: 7 characters provides ample headroom

Step 2: High-Level Design

Loading diagram...

Step 3: API Design

Create Short URL

Field	Type	Required	Description
`long_url`	string	Yes	Original URL to shorten
`custom_alias`	string	No	User-specified short code
`expires_at`	timestamp	No	When the link should expire

Response: Short URL with metadata

Redirect Flow

Loading diagram...

301 vs 302 Redirects

Redirect Type	Browser Behavior	Analytics	SEO
301 Permanent	Browser caches, skips server	Undercounts (cached)	Passes link juice
302 Temporary	Always hits server	Accurate tracking	Link juice may not pass

Recommendation: Use 301 for performance, 302 if analytics are critical

Step 4: Short Code Generation Strategies

Strategy Comparison

Strategy	Uniqueness	Predictability	Coordination	Complexity
MD5/SHA Hash	Needs collision handling	Deterministic	None	Low
Counter (Auto-increment)	Guaranteed unique	Very predictable	Central counter	Low
Random Generation	Needs collision check	Unpredictable	None	Low
UUID to Base62	Very likely unique	Unpredictable	None	Low
Snowflake ID	Guaranteed unique	Semi-predictable	Minimal	Medium
Pre-generated Pool	Guaranteed unique	Unpredictable	Pool management	Medium

Strategy 1: Hash-Based

Loading diagram...

Pros: Same URL produces same code (deduplication possible) Cons: Collisions require handling, predictable pattern

Strategy 2: Counter-Based (Distributed)

Loading diagram...

Pros: No collisions, no DB lookup for uniqueness Cons: Sequential patterns reveal traffic volume

Strategy 3: Snowflake ID (Recommended)

Loading diagram...

Component	Bits	Purpose
Sign bit	1	Always 0 (positive number)
Timestamp	41	Milliseconds since epoch (69 years)
Machine ID	10	1024 unique machines
Sequence	12	4096 IDs per millisecond per machine

Result: ~4 million IDs/second per machine, globally unique, no coordination

Step 5: Database Design

Schema Design

Column	Type	Index	Notes
`id`	BIGINT	Primary Key	Snowflake ID
`short_code`	VARCHAR(10)	Unique Index	The short URL code
`long_url`	TEXT	None	Original URL (could be very long)
`user_id`	BIGINT	Index	Owner of the URL (if authenticated)
`created_at`	TIMESTAMP	None	For analytics
`expires_at`	TIMESTAMP	Index	For cleanup jobs
`click_count`	BIGINT	None	Denormalized for quick access

Database Choice

Database	Pros	Cons	Best For
PostgreSQL	ACID, mature, indexes	Vertical scaling limits	Moderate scale
MySQL	Well-understood, replication	Similar scaling limits	Moderate scale
DynamoDB	Auto-scaling, managed	Eventually consistent, costly	High scale, AWS
Cassandra	Linear scaling, high write	Eventual consistency	Very high scale

Recommendation: Start with PostgreSQL with read replicas, migrate to Cassandra/DynamoDB at scale

Sharding Strategy

Loading diagram...

AWS Alternative: Use Amazon DynamoDB which handles partitioning automatically based on partition key.

Sharding Key	Pros	Cons
short_code	Even distribution, direct lookups	Requires consistent hashing for scaling
user_id	Good for user-specific queries	Hot users cause hot shards
created_at	Good for time-based cleanup	Recent shards are hot

Step 6: Caching Strategy

Cache Architecture

Loading diagram...

Cache Configuration

Cache Tier	TTL	Size	Hit Rate
CDN	24 hours	Unlimited	60-70%
Local memory	5 minutes	10K URLs	20-30%
Redis	30 days	100M URLs	95%+

URL access follows Zipf distribution - a small percentage of URLs get most traffic.

Cache Invalidation

Strategy	When to Use
TTL-based	Default for all entries
Write-through	Update cache on every write
Delete on expire	When URL expires or is deleted

Step 7: Analytics (Optional)

Real-Time Click Tracking

Loading diagram...

Tracked Metrics

Metric	Storage	Use Case
Total clicks	Counter in main DB	Quick access
Clicks over time	Time-series DB	Graphs
Geographic data	Analytics DB	Country/city breakdown
Referrer	Analytics DB	Traffic sources
Device/browser	Analytics DB	User agent analysis

Step 8: Security Considerations

Preventing Abuse

Threat	Mitigation
Enumeration attacks	Random codes, not sequential
Malicious URLs	URL scanning, blocklist checking
Spam creation	Rate limiting, CAPTCHA, authentication
URL hijacking	Ownership verification for custom aliases

Rate Limiting

Operation	Limit	Scope
URL creation	100/hour	Per user/IP
Redirects	1000/minute	Per short code
Custom alias	10/day	Per user

Real-World Systems

Service	Scale	Notable Features
bit.ly	600M clicks/day	Enterprise analytics, branded links
TinyURL	Original service	Simple, no account needed
Rebrandly	Focused on branding	Custom domains, link management
short.io	API-first	White-label solution
T.co (Twitter)	Billions/day	Integrated malware scanning

Summary: Key Design Decisions

Decision	Options	Recommendation
Code generation	Hash, Counter, Random, Snowflake	Snowflake ID for uniqueness + time ordering
Code length	6-8 characters	7 characters (3.5T combinations)
Database	SQL, NoSQL	PostgreSQL to DynamoDB at scale
Caching	Redis, Memcached	Redis Cluster with 30-day TTL
Redirect type	301 vs 302	301 for performance, 302 for analytics
Analytics	Sync vs Async	Async via message queue

Step 1: Requirements and Scope​

Functional Requirements​

Non-Functional Requirements​

Scale Estimation​

How Many Characters for Short Code?​

Step 2: High-Level Design​

Step 3: API Design​

Create Short URL​

Redirect Flow​

301 vs 302 Redirects​

Step 4: Short Code Generation Strategies​

Strategy Comparison​

Strategy 1: Hash-Based​

Strategy 2: Counter-Based (Distributed)​

Strategy 3: Snowflake ID (Recommended)​

Step 5: Database Design​

Schema Design​

Database Choice​

Sharding Strategy​

Step 6: Caching Strategy​

Cache Architecture​

Cache Configuration​

Cache Invalidation​

Step 7: Analytics (Optional)​

Real-Time Click Tracking​

Tracked Metrics​

Step 8: Security Considerations​

Preventing Abuse​

Rate Limiting​

Real-World Systems​

Summary: Key Design Decisions​

Table of Contents

Step 1: Requirements and Scope

Functional Requirements

Non-Functional Requirements

Scale Estimation

How Many Characters for Short Code?

Step 2: High-Level Design

Step 3: API Design

Create Short URL

Redirect Flow

301 vs 302 Redirects

Step 4: Short Code Generation Strategies

Strategy Comparison

Strategy 1: Hash-Based

Strategy 2: Counter-Based (Distributed)

Strategy 3: Snowflake ID (Recommended)

Step 5: Database Design

Schema Design

Database Choice

Sharding Strategy

Step 6: Caching Strategy

Cache Architecture

Cache Configuration

Cache Invalidation

Step 7: Analytics (Optional)

Real-Time Click Tracking

Tracked Metrics

Step 8: Security Considerations

Preventing Abuse

Rate Limiting

Real-World Systems

Summary: Key Design Decisions