Design an Instagram Ranking Model

Design a machine learning system to rank posts in Instagram's feed.

Requirements

Functional:

Rank posts for a user's home feed
Consider posts from followed accounts
Include various content types (photos, videos, reels, stories)
Personalized ranking per user

Non-functional:

Latency: < 200ms for ranking
Scale: 2B+ users, 500M+ daily active users
Freshness: Include recent posts

Metrics

Offline Metrics

Metric	Description	Target
AUC-ROC	Engagement prediction quality	> 0.85
NDCG@10	Ranking quality for top 10	> 0.7
Precision@K	Relevant items in top K	> 0.6

Online Metrics

Metric	Description
Engagement Rate	Likes + Comments / Impressions
Time Spent	Minutes per session
Daily Active Users	Returning users
Content Diversity	Variety of accounts shown
Creator Distribution	Fair exposure to creators

Guardrail Metrics

User satisfaction (surveys)
Session frequency
Unfollow rate
Time to first engagement

Architecture

Loading diagram...

Candidate Generation

Sources

Following posts: Posts from followed accounts
Explore suggestions: Relevant content from non-followed accounts
Ads: Sponsored content (separate ranking)

Filtering Pipeline

Step	Description
1	Collect posts from following (last 48 hours)
2	Filter already seen (last 24 hours)
3	Filter blocked/muted accounts
4	Limit to top 500 candidates

Feature Engineering

User Features

Feature	Type	Description
user_age	Numerical	Days since account creation
follower_count	Numerical	Number of followers
following_count	Numerical	Number following
avg_session_time	Numerical	Average session duration
preferred_content_type	Categorical	Photo, video, reel preference
active_hours	Embedding	When user is typically active
engagement_history	Embedding	Historical engagement patterns

Post Features

Feature	Type	Description
post_age	Numerical	Hours since posting
media_type	Categorical	Photo, video, carousel, reel
caption_length	Numerical	Length of caption
hashtag_count	Numerical	Number of hashtags
has_location	Binary	Location tagged
historical_engagement	Numerical	Likes, comments received
author_engagement_rate	Numerical	Author's typical engagement
visual_features	Embedding	Image/video embeddings
text_embedding	Embedding	Caption embedding

User-Post Interaction Features

Feature	Type	Description
author_relationship	Categorical	Friend, acquaintance, celebrity
past_interactions	Numerical	Previous likes/comments on author
content_affinity	Numerical	User's affinity for content type
topic_match	Numerical	Topic similarity score
time_since_last_interaction	Numerical	Recency of engagement with author

Contextual Features

Feature	Type	Description
hour_of_day	Categorical	Current hour
day_of_week	Categorical	Current day
device_type	Categorical	Mobile, tablet, desktop
connection_type	Categorical	WiFi, cellular
app_version	Categorical	App version

Model Architecture

Multi-Task Learning Approach

Predict multiple engagement types simultaneously:

Loading diagram...

Final Score Combination

Engagement Type	Base Weight	Multiplier
Like	w1	1x
Comment	w2	2x
Share	w3	3x
Save	w4	2x
Time Spent	w5	1x

Final Score = Sum(weight x prediction x multiplier)

Multipliers reflect relative value of each engagement type.

Model Details

Architecture: Deep Neural Network with embedding layers

Layer	Configuration
User embedding	128-dim
Post embedding	128-dim
Shared hidden layers	512 -> ReLU -> Dropout -> 256 -> ReLU
Task-specific heads	Separate heads for like, comment, share, save

Training Pipeline

Data Collection

Field	Type	Description
user_id	String	User who saw the post
post_id	String	The post shown
features	Dict	All user, post, context features
liked	Binary	Did user like? (0/1)
commented	Binary	Did user comment? (0/1)
shared	Binary	Did user share? (0/1)
time_spent	Float	Seconds viewing post
position	Int	Where shown in feed (for position bias correction)

Handling Position Bias

Posts shown at top receive more engagement regardless of quality. Solutions:

Inverse propensity weighting
Position as feature during training

Training Schedule

Retrain model daily with past 30 days of data
Online learning for real-time personalization
A/B test before full deployment

Serving

Online Inference Pipeline

Step	Action	Latency Target
1	Candidate generation: ~500 candidates	20ms
2	Feature retrieval (parallel): user and post features	50ms
3	Batch scoring: score all candidates	80ms
4	Business rules: diversity, freshness, close friends	20ms
5	Return top 50 posts	-

Total target: < 200ms end-to-end

Business Rules

Rule	Adjustment	Rationale
Diversity	Max 2 posts per author in top 20	Prevent feed dominance
Freshness	+10% boost if < 1 hour old	Surface breaking content
Close Friends	+20% boost	Prioritize meaningful relationships
Content Type Mix	Ensure variety	Balance photos and videos
Creator Fairness	Minimum exposure floor	Support smaller creators

Monitoring

Real-time Dashboards

Engagement rates by content type
Ranking model latency
Feature serving latency
Model prediction distribution

Alerts

Alert	Condition	Severity
Engagement Drop	Rate < 95% of baseline	High
Latency Spike	p99 > 300ms	Medium
Prediction Drift	Score distribution shift	Medium
Model Staleness	Model > 48 hours old	Low

Reference

Topic	Description
Cold-start for new users	Use popularity-based ranking initially. Learn from first interactions.
Fairness for creators	Set minimum exposure floors. Prevent algorithm from only showing established accounts.
Negative feedback handling	Hide and report signals are weighted heavily. Track unfollows after showing certain content.
Organic vs ads	Ads have separate ranking model. Insert at designated slots.
Engagement vs time well spent	Heavy engagement optimization can create addictive patterns. Balance with user satisfaction.
Personalization vs diversity	Over-personalization creates filter bubbles. Inject diversity intentionally.
Freshness vs quality	New posts have less engagement signal. Use content-based features for fresh content.

Requirements​

Metrics​

Offline Metrics​

Online Metrics​

Guardrail Metrics​

Architecture​

Candidate Generation​

Sources​

Filtering Pipeline​

Feature Engineering​

User Features​

Post Features​

User-Post Interaction Features​

Contextual Features​

Model Architecture​

Multi-Task Learning Approach​

Final Score Combination​

Model Details​

Training Pipeline​

Data Collection​

Handling Position Bias​

Training Schedule​

Serving​

Online Inference Pipeline​

Business Rules​

Monitoring​

Real-time Dashboards​

Alerts​

Reference​

Table of Contents