Skip to main content

Design a YouTube Video Prediction System

Design a system to predict whether a user will watch a video and for how long.

Requirements

Functional:

  • Predict watch time for candidate videos
  • Power video recommendations
  • Rank videos on homepage and search results

Non-functional:

  • Score millions of candidates per request
  • Latency < 200ms
  • Scale to 2B+ users

Metrics

TypeMetrics
OfflineMAE for watch time, AUC for click prediction
OnlineWatch time per session, session duration, daily active users

Architecture

Loading diagram...

Features

CategoryFeatures
UserWatch history, search history, demographics, subscription list
VideoTitle/description embeddings, visual features, engagement stats, freshness
ContextTime of day, device, previous video watched

Model

Two-tower architecture for retrieval + deep ranking model for final scoring:

Loading diagram...
ComponentInputOutput
User TowerWatch history, search history, demographics128-dim user embedding
Video TowerTitle/description, engagement stats, freshness128-dim video embedding
InteractionConcatenated embeddings (256-dim)Predicted watch time (seconds)

Training

Use weighted logistic regression where positive examples are weighted by watch time:

  • Longer watches = stronger positive signal
  • Handles both click and engagement

Reference

TopicDescription
Position biasVideos shown first receive more clicks regardless of quality. Correct for this in training.
ExplorationAllocate exploration budget to prevent showing only similar content to past watches.
FreshnessNew videos have no engagement data. Use content-based signals and creator reputation.
Filter bubblesOver-personalization creates echo chambers. Inject diversity intentionally.