Skip to main content

Design a News Feed System

Related Concepts: Fan-Out on Write vs. Fan-Out on Read, Caching (Timeline Cache), Ranking Algorithm, Graph Database, Denormalization, CDN for Media, Pagination (Cursor-Based), Hybrid Push-Pull Model

Design a news feed system that displays posts from friends and followed accounts, similar to Facebook's feed or Twitter's timeline.

Step 1: Requirements and Scope

Functional Requirements

  • Users can create posts (text, images, videos)
  • Users see a feed of posts from people they follow
  • Feed is sorted by relevance/recency (ranked)
  • Support pagination (infinite scroll)
  • Near real-time updates for new posts

Non-Functional Requirements

RequirementTargetRationale
Latency< 200ms p99Feed must feel instant
Availability99.99%Core feature, always needed
ConsistencyEventualMissing a post briefly is acceptable
Scalability5B requests/dayMassive user base

Scale Estimation

  • 500 million daily active users
  • Average user follows 200 accounts
  • Average user views feed 10 times per day
  • Peak: 5 billion feed requests per day (~60K/second)
  • Average post has 500 followers to notify

Step 2: The Core Challenge

When User A creates a post, all of A's followers should see it in their feeds. With millions of users and complex follow graphs, efficiency is critical.

Loading diagram...

Step 3: Push vs Pull Architecture

Option 1: Pull Model (Fan-Out on Read)

Loading diagram...
AdvantagesDisadvantages
Simple to implementSlow for users following many accounts
New posts immediately availableHeavy database load at read time
No wasted work for inactive usersCold start problem (cache misses)

Option 2: Push Model (Fan-Out on Write)

Loading diagram...
AdvantagesDisadvantages
Fast reads (pre-computed)High write amplification
Consistent read latencyCelebrity problem (millions of followers)
Better cache utilizationWasted work for inactive users
Loading diagram...
User TypeStrategyRationale
Normal users (< 10K followers)PushFast reads, manageable write load
Celebrities (>= 10K followers)PullAvoid millions of writes per post

Step 4: High-Level Architecture

Loading diagram...

Step 5: Feed Cache Design

Cache Structure

ComponentStoragePurpose
Feed cacheRedis Sorted SetPre-computed feed per user
ScorePost timestampOrdering
ValuePost IDReference to full post
Size limit500-1000 postsBalance memory vs coverage

Cache Operations

OperationCommandComplexity
Add post to feedZADD feed:{user_id} {timestamp} {post_id}O(log N)
Get top postsZREVRANGE feed:{user_id} 0 19O(log N + K)
Trim old postsZREMRANGEBYRANK feed:{user_id} 0 -1001O(log N + M)
Remove unfollowedZREM feed:{user_id} {post_ids...}O(M log N)

Cache Size Estimation

FactorValue
Active users100M (subset of 500M DAU)
Posts per feed500 post IDs
Post ID size8 bytes
Per user500 x 8 = 4 KB
Total100M x 4 KB = 400 GB

Step 6: Fan-Out Process

Write Flow Detail

Loading diagram...

Fan-Out Performance

MetricValue
Average followers500
Batch size1000
Redis ZADD latency1ms
Fan-out time for 500 followers500ms total
Fan-out time for 1M followersToo slow (use pull)

Step 7: Ranking System

Raw chronological feeds are noisy. Modern feeds rank by relevance.

Ranking Signals

SignalWeightDescription
Recency30%Newer posts score higher
Engagement40%Likes, comments, shares
Affinity20%How often user interacts with author
Content type10%User's preferred content types

Ranking Architecture

Loading diagram...

Two-Phase Ranking

PhasePurposePostsLatency Budget
Candidate generationGet potential posts500-100050ms
RankingScore and sortTop 100100ms

Step 8: Pagination

Loading diagram...

Why Not Offset Pagination?

ProblemWith OffsetWith Cursor
New posts arriveSkip or duplicate postsStable position
Deleted postsPage numbers shiftUnaffected
PerformanceO(offset) scanO(1) lookup

Step 9: Real-Time Updates

Options Comparison

MethodLatencyServer LoadComplexity
PollingSecondsHigh (many requests)Low
Long pollingSub-secondMediumMedium
WebSocketReal-timeLow (persistent)High
Server-Sent EventsReal-timeLowMedium

WebSocket Architecture

Loading diagram...

Step 10: Handling Edge Cases

New User (Cold Start)

ScenarioSolution
No feed cache existsBuild from scratch using pull
No following historyShow trending/suggested content
Cache warmingPre-build on first login

Inactive User Returns

ScenarioSolution
Feed cache expiredRebuild on demand
Many missed postsShow highlights, not all posts
Catch-up periodGradually backfill cache

Viral Post

ChallengeSolution
Engagement spikesDo not re-rank entire feed
Celebrity postPull model avoids write storm
Denormalized countsUpdate lazily, not in real-time

Production Examples

CompanyApproachNotable Features
FacebookHybridML-based ranking, EdgeRank algorithm
TwitterPull-heavyTimeline mixing (home + algorithmic)
InstagramHybridInterest-based ranking
LinkedInPush for feedHeavy use of Kafka
TikTokPull + ML"For You" is all ML-ranked

Summary: Key Design Decisions

DecisionOptionsRecommendation
Fan-out modelPush, Pull, HybridHybrid (push for normal, pull for celebrities)
Feed storageDatabase, CacheRedis sorted sets
RankingChronological, ML-rankedML-ranked for engagement
Real-time updatesPolling, WebSocketWebSocket for active users
PaginationOffset, CursorCursor-based