Design a News Feed System
Related Concepts: Fan-Out on Write vs. Fan-Out on Read, Caching (Timeline Cache), Ranking Algorithm, Graph Database, Denormalization, CDN for Media, Pagination (Cursor-Based), Hybrid Push-Pull Model
Design a news feed system that displays posts from friends and followed accounts, similar to Facebook's feed or Twitter's timeline.
- Users can create posts (text, images, videos)
- Users see a feed of posts from people they follow
- Feed is sorted by relevance/recency (ranked)
- Support pagination (infinite scroll)
- Near real-time updates for new posts
| Requirement | Target | Rationale |
|---|
| Latency | < 200ms p99 | Feed must feel instant |
| Availability | 99.99% | Core feature, always needed |
| Consistency | Eventual | Missing a post briefly is acceptable |
| Scalability | 5B requests/day | Massive user base |
- 500 million daily active users
- Average user follows 200 accounts
- Average user views feed 10 times per day
- Peak: 5 billion feed requests per day (~60K/second)
- Average post has 500 followers to notify
When User A creates a post, all of A's followers should see it in their feeds. With millions of users and complex follow graphs, efficiency is critical.
Loading diagram...
Loading diagram...
| Advantages | Disadvantages |
|---|
| Simple to implement | Slow for users following many accounts |
| New posts immediately available | Heavy database load at read time |
| No wasted work for inactive users | Cold start problem (cache misses) |
Loading diagram...
| Advantages | Disadvantages |
|---|
| Fast reads (pre-computed) | High write amplification |
| Consistent read latency | Celebrity problem (millions of followers) |
| Better cache utilization | Wasted work for inactive users |
Loading diagram...
| User Type | Strategy | Rationale |
|---|
| Normal users (< 10K followers) | Push | Fast reads, manageable write load |
| Celebrities (>= 10K followers) | Pull | Avoid millions of writes per post |
Loading diagram...
| Component | Storage | Purpose |
|---|
| Feed cache | Redis Sorted Set | Pre-computed feed per user |
| Score | Post timestamp | Ordering |
| Value | Post ID | Reference to full post |
| Size limit | 500-1000 posts | Balance memory vs coverage |
| Operation | Command | Complexity |
|---|
| Add post to feed | ZADD feed:{user_id} {timestamp} {post_id} | O(log N) |
| Get top posts | ZREVRANGE feed:{user_id} 0 19 | O(log N + K) |
| Trim old posts | ZREMRANGEBYRANK feed:{user_id} 0 -1001 | O(log N + M) |
| Remove unfollowed | ZREM feed:{user_id} {post_ids...} | O(M log N) |
| Factor | Value |
|---|
| Active users | 100M (subset of 500M DAU) |
| Posts per feed | 500 post IDs |
| Post ID size | 8 bytes |
| Per user | 500 x 8 = 4 KB |
| Total | 100M x 4 KB = 400 GB |
Loading diagram...
| Metric | Value |
|---|
| Average followers | 500 |
| Batch size | 1000 |
| Redis ZADD latency | 1ms |
| Fan-out time for 500 followers | 500ms total |
| Fan-out time for 1M followers | Too slow (use pull) |
Raw chronological feeds are noisy. Modern feeds rank by relevance.
| Signal | Weight | Description |
|---|
| Recency | 30% | Newer posts score higher |
| Engagement | 40% | Likes, comments, shares |
| Affinity | 20% | How often user interacts with author |
| Content type | 10% | User's preferred content types |
Loading diagram...
| Phase | Purpose | Posts | Latency Budget |
|---|
| Candidate generation | Get potential posts | 500-1000 | 50ms |
| Ranking | Score and sort | Top 100 | 100ms |
Loading diagram...
| Problem | With Offset | With Cursor |
|---|
| New posts arrive | Skip or duplicate posts | Stable position |
| Deleted posts | Page numbers shift | Unaffected |
| Performance | O(offset) scan | O(1) lookup |
| Method | Latency | Server Load | Complexity |
|---|
| Polling | Seconds | High (many requests) | Low |
| Long polling | Sub-second | Medium | Medium |
| WebSocket | Real-time | Low (persistent) | High |
| Server-Sent Events | Real-time | Low | Medium |
Loading diagram...
| Scenario | Solution |
|---|
| No feed cache exists | Build from scratch using pull |
| No following history | Show trending/suggested content |
| Cache warming | Pre-build on first login |
| Scenario | Solution |
|---|
| Feed cache expired | Rebuild on demand |
| Many missed posts | Show highlights, not all posts |
| Catch-up period | Gradually backfill cache |
Viral Post
| Challenge | Solution |
|---|
| Engagement spikes | Do not re-rank entire feed |
| Celebrity post | Pull model avoids write storm |
| Denormalized counts | Update lazily, not in real-time |
| Company | Approach | Notable Features |
|---|
| Facebook | Hybrid | ML-based ranking, EdgeRank algorithm |
| Twitter | Pull-heavy | Timeline mixing (home + algorithmic) |
| Instagram | Hybrid | Interest-based ranking |
| LinkedIn | Push for feed | Heavy use of Kafka |
| TikTok | Pull + ML | "For You" is all ML-ranked |
| Decision | Options | Recommendation |
|---|
| Fan-out model | Push, Pull, Hybrid | Hybrid (push for normal, pull for celebrities) |
| Feed storage | Database, Cache | Redis sorted sets |
| Ranking | Chronological, ML-ranked | ML-ranked for engagement |
| Real-time updates | Polling, WebSocket | WebSocket for active users |
| Pagination | Offset, Cursor | Cursor-based |