Skip to main content

Design an Instagram Ranking Model

Design a machine learning system to rank posts in Instagram's feed.

Requirements

Functional:

  • Rank posts for a user's home feed
  • Consider posts from followed accounts
  • Include various content types (photos, videos, reels, stories)
  • Personalized ranking per user

Non-functional:

  • Latency: < 200ms for ranking
  • Scale: 2B+ users, 500M+ daily active users
  • Freshness: Include recent posts

Metrics

Offline Metrics

MetricDescriptionTarget
AUC-ROCEngagement prediction quality> 0.85
NDCG@10Ranking quality for top 10> 0.7
Precision@KRelevant items in top K> 0.6

Online Metrics

MetricDescription
Engagement RateLikes + Comments / Impressions
Time SpentMinutes per session
Daily Active UsersReturning users
Content DiversityVariety of accounts shown
Creator DistributionFair exposure to creators

Guardrail Metrics

  • User satisfaction (surveys)
  • Session frequency
  • Unfollow rate
  • Time to first engagement

Architecture

Loading diagram...

Candidate Generation

Sources

  1. Following posts: Posts from followed accounts
  2. Explore suggestions: Relevant content from non-followed accounts
  3. Ads: Sponsored content (separate ranking)

Filtering Pipeline

StepDescription
1Collect posts from following (last 48 hours)
2Filter already seen (last 24 hours)
3Filter blocked/muted accounts
4Limit to top 500 candidates

Feature Engineering

User Features

FeatureTypeDescription
user_ageNumericalDays since account creation
follower_countNumericalNumber of followers
following_countNumericalNumber following
avg_session_timeNumericalAverage session duration
preferred_content_typeCategoricalPhoto, video, reel preference
active_hoursEmbeddingWhen user is typically active
engagement_historyEmbeddingHistorical engagement patterns

Post Features

FeatureTypeDescription
post_ageNumericalHours since posting
media_typeCategoricalPhoto, video, carousel, reel
caption_lengthNumericalLength of caption
hashtag_countNumericalNumber of hashtags
has_locationBinaryLocation tagged
historical_engagementNumericalLikes, comments received
author_engagement_rateNumericalAuthor's typical engagement
visual_featuresEmbeddingImage/video embeddings
text_embeddingEmbeddingCaption embedding

User-Post Interaction Features

FeatureTypeDescription
author_relationshipCategoricalFriend, acquaintance, celebrity
past_interactionsNumericalPrevious likes/comments on author
content_affinityNumericalUser's affinity for content type
topic_matchNumericalTopic similarity score
time_since_last_interactionNumericalRecency of engagement with author

Contextual Features

FeatureTypeDescription
hour_of_dayCategoricalCurrent hour
day_of_weekCategoricalCurrent day
device_typeCategoricalMobile, tablet, desktop
connection_typeCategoricalWiFi, cellular
app_versionCategoricalApp version

Model Architecture

Multi-Task Learning Approach

Predict multiple engagement types simultaneously:

Loading diagram...

Final Score Combination

Engagement TypeBase WeightMultiplier
Likew11x
Commentw22x
Sharew33x
Savew42x
Time Spentw51x

Final Score = Sum(weight x prediction x multiplier)

Multipliers reflect relative value of each engagement type.

Model Details

Architecture: Deep Neural Network with embedding layers

LayerConfiguration
User embedding128-dim
Post embedding128-dim
Shared hidden layers512 -> ReLU -> Dropout -> 256 -> ReLU
Task-specific headsSeparate heads for like, comment, share, save

Training Pipeline

Data Collection

FieldTypeDescription
user_idStringUser who saw the post
post_idStringThe post shown
featuresDictAll user, post, context features
likedBinaryDid user like? (0/1)
commentedBinaryDid user comment? (0/1)
sharedBinaryDid user share? (0/1)
time_spentFloatSeconds viewing post
positionIntWhere shown in feed (for position bias correction)

Handling Position Bias

Posts shown at top receive more engagement regardless of quality. Solutions:

  • Inverse propensity weighting
  • Position as feature during training

Training Schedule

  • Retrain model daily with past 30 days of data
  • Online learning for real-time personalization
  • A/B test before full deployment

Serving

Online Inference Pipeline

StepActionLatency Target
1Candidate generation: ~500 candidates20ms
2Feature retrieval (parallel): user and post features50ms
3Batch scoring: score all candidates80ms
4Business rules: diversity, freshness, close friends20ms
5Return top 50 posts-

Total target: < 200ms end-to-end

Business Rules

RuleAdjustmentRationale
DiversityMax 2 posts per author in top 20Prevent feed dominance
Freshness+10% boost if < 1 hour oldSurface breaking content
Close Friends+20% boostPrioritize meaningful relationships
Content Type MixEnsure varietyBalance photos and videos
Creator FairnessMinimum exposure floorSupport smaller creators

Monitoring

Real-time Dashboards

  • Engagement rates by content type
  • Ranking model latency
  • Feature serving latency
  • Model prediction distribution

Alerts

AlertConditionSeverity
Engagement DropRate < 95% of baselineHigh
Latency Spikep99 > 300msMedium
Prediction DriftScore distribution shiftMedium
Model StalenessModel > 48 hours oldLow

Reference

TopicDescription
Cold-start for new usersUse popularity-based ranking initially. Learn from first interactions.
Fairness for creatorsSet minimum exposure floors. Prevent algorithm from only showing established accounts.
Negative feedback handlingHide and report signals are weighted heavily. Track unfollows after showing certain content.
Organic vs adsAds have separate ranking model. Insert at designated slots.
Engagement vs time well spentHeavy engagement optimization can create addictive patterns. Balance with user satisfaction.
Personalization vs diversityOver-personalization creates filter bubbles. Inject diversity intentionally.
Freshness vs qualityNew posts have less engagement signal. Use content-based features for fresh content.