Design a URL shortening service that converts long URLs into short, shareable links and redirects users to the original URLs.
Related Concepts: Base62 Encoding, Database Sharding, Caching (Redis/Memcached), Load Balancing, Unique ID Generation, CAP Theorem, Read-Heavy Optimization, TTL/Expiration
- Given a long URL, generate a unique short URL
- Given a short URL, redirect to the original long URL
- Support custom short aliases (optional)
- Support URL expiration (optional)
- Track click analytics (optional)
| Requirement | Target | Rationale |
|---|
| Availability | 99.99% | Broken links = lost users and revenue |
| Latency | < 100ms redirect | Users expect instant redirects |
| Durability | Never lose URLs | Links shared widely must always work |
| Scalability | Billions of URLs | Service grows over time |
| Security | Non-guessable URLs | Prevent enumeration attacks |
- 100 million new URLs per month (~40 URLs/second)
- 10:1 read to write ratio -> 1 billion redirects/month (~400 redirects/second)
- Average URL length: 200 bytes
- Storage for 5 years: 100M x 12 x 5 x 200 bytes = 1.2 TB (excluding metadata)
| Characters | Base62 Combinations | Years to Exhaust (at 100M/month) |
|---|
| 6 | 56.8 billion | 47 years |
| 7 | 3.5 trillion | 2,900 years |
| 8 | 218 trillion | 181,000 years |
Recommendation: 7 characters provides ample headroom
Loading diagram...
| Field | Type | Required | Description |
|---|
long_url | string | Yes | Original URL to shorten |
custom_alias | string | No | User-specified short code |
expires_at | timestamp | No | When the link should expire |
Response: Short URL with metadata
Loading diagram...
| Redirect Type | Browser Behavior | Analytics | SEO |
|---|
| 301 Permanent | Browser caches, skips server | Undercounts (cached) | Passes link juice |
| 302 Temporary | Always hits server | Accurate tracking | Link juice may not pass |
Recommendation: Use 301 for performance, 302 if analytics are critical
| Strategy | Uniqueness | Predictability | Coordination | Complexity |
|---|
| MD5/SHA Hash | Needs collision handling | Deterministic | None | Low |
| Counter (Auto-increment) | Guaranteed unique | Very predictable | Central counter | Low |
| Random Generation | Needs collision check | Unpredictable | None | Low |
| UUID to Base62 | Very likely unique | Unpredictable | None | Low |
| Snowflake ID | Guaranteed unique | Semi-predictable | Minimal | Medium |
| Pre-generated Pool | Guaranteed unique | Unpredictable | Pool management | Medium |
Loading diagram...
Pros: Same URL produces same code (deduplication possible)
Cons: Collisions require handling, predictable pattern
Loading diagram...
Pros: No collisions, no DB lookup for uniqueness
Cons: Sequential patterns reveal traffic volume
Loading diagram...
| Component | Bits | Purpose |
|---|
| Sign bit | 1 | Always 0 (positive number) |
| Timestamp | 41 | Milliseconds since epoch (69 years) |
| Machine ID | 10 | 1024 unique machines |
| Sequence | 12 | 4096 IDs per millisecond per machine |
Result: ~4 million IDs/second per machine, globally unique, no coordination
| Column | Type | Index | Notes |
|---|
id | BIGINT | Primary Key | Snowflake ID |
short_code | VARCHAR(10) | Unique Index | The short URL code |
long_url | TEXT | None | Original URL (could be very long) |
user_id | BIGINT | Index | Owner of the URL (if authenticated) |
created_at | TIMESTAMP | None | For analytics |
expires_at | TIMESTAMP | Index | For cleanup jobs |
click_count | BIGINT | None | Denormalized for quick access |
| Database | Pros | Cons | Best For |
|---|
| PostgreSQL | ACID, mature, indexes | Vertical scaling limits | Moderate scale |
| MySQL | Well-understood, replication | Similar scaling limits | Moderate scale |
| DynamoDB | Auto-scaling, managed | Eventually consistent, costly | High scale, AWS |
| Cassandra | Linear scaling, high write | Eventual consistency | Very high scale |
Recommendation: Start with PostgreSQL with read replicas, migrate to Cassandra/DynamoDB at scale
Loading diagram...
AWS Alternative: Use Amazon DynamoDB which handles partitioning automatically based on partition key.
| Sharding Key | Pros | Cons |
|---|
| short_code | Even distribution, direct lookups | Requires consistent hashing for scaling |
| user_id | Good for user-specific queries | Hot users cause hot shards |
| created_at | Good for time-based cleanup | Recent shards are hot |
Loading diagram...
| Cache Tier | TTL | Size | Hit Rate |
|---|
| CDN | 24 hours | Unlimited | 60-70% |
| Local memory | 5 minutes | 10K URLs | 20-30% |
| Redis | 30 days | 100M URLs | 95%+ |
URL access follows Zipf distribution - a small percentage of URLs get most traffic.
| Strategy | When to Use |
|---|
| TTL-based | Default for all entries |
| Write-through | Update cache on every write |
| Delete on expire | When URL expires or is deleted |
Loading diagram...
| Metric | Storage | Use Case |
|---|
| Total clicks | Counter in main DB | Quick access |
| Clicks over time | Time-series DB | Graphs |
| Geographic data | Analytics DB | Country/city breakdown |
| Referrer | Analytics DB | Traffic sources |
| Device/browser | Analytics DB | User agent analysis |
| Threat | Mitigation |
|---|
| Enumeration attacks | Random codes, not sequential |
| Malicious URLs | URL scanning, blocklist checking |
| Spam creation | Rate limiting, CAPTCHA, authentication |
| URL hijacking | Ownership verification for custom aliases |
| Operation | Limit | Scope |
|---|
| URL creation | 100/hour | Per user/IP |
| Redirects | 1000/minute | Per short code |
| Custom alias | 10/day | Per user |
| Service | Scale | Notable Features |
|---|
| bit.ly | 600M clicks/day | Enterprise analytics, branded links |
| TinyURL | Original service | Simple, no account needed |
| Rebrandly | Focused on branding | Custom domains, link management |
| short.io | API-first | White-label solution |
| T.co (Twitter) | Billions/day | Integrated malware scanning |
| Decision | Options | Recommendation |
|---|
| Code generation | Hash, Counter, Random, Snowflake | Snowflake ID for uniqueness + time ordering |
| Code length | 6-8 characters | 7 characters (3.5T combinations) |
| Database | SQL, NoSQL | PostgreSQL to DynamoDB at scale |
| Caching | Redis, Memcached | Redis Cluster with 30-day TTL |
| Redirect type | 301 vs 302 | 301 for performance, 302 for analytics |
| Analytics | Sync vs Async | Async via message queue |