Design a YouTube Video Prediction System
Design a system to predict whether a user will watch a video and for how long.
Requirements
Functional:
- Predict watch time for candidate videos
- Power video recommendations
- Rank videos on homepage and search results
Non-functional:
- Score millions of candidates per request
- Latency < 200ms
- Scale to 2B+ users
Metrics
| Type | Metrics |
|---|---|
| Offline | MAE for watch time, AUC for click prediction |
| Online | Watch time per session, session duration, daily active users |
Architecture
Loading diagram...
Features
| Category | Features |
|---|---|
| User | Watch history, search history, demographics, subscription list |
| Video | Title/description embeddings, visual features, engagement stats, freshness |
| Context | Time of day, device, previous video watched |
Model
Two-tower architecture for retrieval + deep ranking model for final scoring:
Loading diagram...
| Component | Input | Output |
|---|---|---|
| User Tower | Watch history, search history, demographics | 128-dim user embedding |
| Video Tower | Title/description, engagement stats, freshness | 128-dim video embedding |
| Interaction | Concatenated embeddings (256-dim) | Predicted watch time (seconds) |
Training
Use weighted logistic regression where positive examples are weighted by watch time:
- Longer watches = stronger positive signal
- Handles both click and engagement
Reference
| Topic | Description |
|---|---|
| Position bias | Videos shown first receive more clicks regardless of quality. Correct for this in training. |
| Exploration | Allocate exploration budget to prevent showing only similar content to past watches. |
| Freshness | New videos have no engagement data. Use content-based signals and creator reputation. |
| Filter bubbles | Over-personalization creates echo chambers. Inject diversity intentionally. |