Design a Netflix Watch Prediction System
Design a system to predict what users are likely to watch next on Netflix.
Requirements
Functional:
- Predict probability user will watch each title
- Power personalized homepage rows
- Enable relevant push notifications
- Support "Continue Watching" and "Because you watched X"
Non-functional:
- Score 10K+ titles per user
- Homepage load < 500ms
- 200M+ subscribers
- Update recommendations as user watches
Metrics
Offline Metrics
| Metric | Description |
|---|---|
| AUC-ROC | Binary classification quality |
| Recall@20 | Titles user watches in top 20 |
| NDCG | Ranking quality |
| Calibration | Predicted vs actual watch rates |
Online Metrics
| Metric | Description |
|---|---|
| Take Rate | Views / Impressions |
| Browse Time | Time before selecting content |
| Session Starts | Sessions that lead to watching |
| Engagement | Hours watched per session |
Architecture
Loading diagram...
Feature Engineering
User Features
| Feature | Type | Description |
|---|---|---|
| watch_history | Sequence | Last N titles watched |
| genre_preferences | Vector | Affinity for each genre |
| viewing_patterns | Embedding | When/how user watches |
| completion_rate | Numerical | Avg % of content finished |
| account_age | Numerical | Days since signup |
| profile_type | Categorical | Kids, adult, shared |
| language_preference | Categorical | Preferred audio/subtitle |
Title Features
| Feature | Type | Description |
|---|---|---|
| genres | Multi-hot | Associated genres |
| maturity_rating | Categorical | Age rating |
| release_date | Numerical | Days since release |
| runtime | Numerical | Length in minutes |
| content_type | Categorical | Movie, series, documentary |
| popularity_score | Numerical | Global/regional popularity |
| avg_completion | Numerical | Completion rate across users |
| visual_embedding | Vector | Thumbnail/trailer embedding |
| text_embedding | Vector | Synopsis embedding |
| cast_embedding | Vector | Actor/director embeddings |
Interaction Features
| Feature | Type | Description |
|---|---|---|
| similarity_to_watched | Numerical | Content similarity |
| genre_match | Numerical | Genre overlap with preferences |
| actor_affinity | Numerical | Has watched this actor before |
| sequel/series | Binary | Continuation of watched content |
| time_since_similar | Numerical | Recency of watching similar |
Context Features
| Feature | Type | Description |
|---|---|---|
| time_of_day | Categorical | Morning, afternoon, evening |
| day_of_week | Categorical | Weekday vs weekend |
| device | Categorical | TV, mobile, tablet |
| profile_active | Binary | Main profile or secondary |
Model Architecture
Two-Stage Approach
Stage 1: Candidate Generation
Multiple candidate generators, each producing ~100-500 titles:
| Source | Description | Typical Count |
|---|---|---|
| New releases | Recently released titles in user's preferred genres | ~100 |
| Similar to watched | Titles similar to user's last 10 watches | ~500 (50 per title) |
| Collaborative filtering | Recommendations from similar users | ~200 |
| Trending | Popular titles in user's region | ~100 |
| Continue watching | In-progress content for the user | Variable |
Combine all sources into a single candidate pool for ranking.
Stage 2: Ranking Model
Deep neural network to score all candidates:
| Component | Architecture | Output |
|---|---|---|
| User encoder | Transformer for watch history (128-dim) + MLP for profile | 256-dim user embedding |
| Title encoder | BERT for text + ResNet for visuals + MLP for metadata | 256-dim title embedding |
| Interaction layer | Concatenate user + title + context, then Dense(256) -> ReLU -> Dropout -> Dense(128) -> ReLU -> Dense(1) -> Sigmoid | Watch probability (0-1) |
The model encodes user and title separately, concatenates with context features, and predicts the probability of watching.
Training
Training Data
Label definition:
- Positive (label=1): User watched the title
- Negative (label=0): User was shown the title but did not watch
Sample construction:
- For each user session, if the user watched a title, create a positive sample
- For all impressions (titles shown) that were not watched, create negative samples
- Include user ID, title ID, label, and session context for each sample
Loss Function
Binary cross-entropy with class weighting. Apply a positive weight (e.g., 5.0) to up-weight positive samples since watches are rarer than non-watches. This encourages the model to better learn the positive class.
Serving
Real-time Personalization
Homepage generation process:
- Fetch user state: Retrieve user features and recent activity
- Build rows by type:
- Continue Watching: In-progress content for the user
- Because You Watched X: For each of the last 3 watched titles, get and rank similar titles (top 20 per row)
- Trending Now: Trending titles ranked for this user (top 20)
- Top Picks for You: Personalized recommendations (top 20)
- Return assembled homepage: List of row titles with their ranked content
Caching Strategy
| Data | Cache TTL |
|---|---|
| User embeddings | Computed hourly |
| Title embeddings | Computed daily |
| Candidate pools | 15 minutes per user |
| Final rankings | Computed per request |
Monitoring
Key Metrics
Dashboard metrics:
- Take rate by row type
- Browse time percentiles
- Prediction calibration
- Feature drift
Alerts:
| Alert | Condition | Severity |
|---|---|---|
| Take rate drop | Take rate falls below 90% of baseline | High |
Reference
| Topic | Description |
|---|---|
| Cold-start for new users | Start with popular content, use onboarding survey, learn from first interactions. |
| Promoting new content | Reserve exploration slots for new content. Track performance and adjust. |
| Shared accounts | Use profiles. Detect different viewing patterns within profiles. |
| Explicit ratings vs implicit signals | Implicit signals (actual watches) are stronger predictors than stated preferences. |
| Exploitation vs exploration trade-off | Balance safe recommendations with discovery of new content. |
| Personalization vs popularity | Combine individual preferences with social proof signals. |