Design a Netflix Watch Prediction System

Design a system to predict what users are likely to watch next on Netflix.

Requirements

Functional:

Predict probability user will watch each title
Power personalized homepage rows
Enable relevant push notifications
Support "Continue Watching" and "Because you watched X"

Non-functional:

Score 10K+ titles per user
Homepage load < 500ms
200M+ subscribers
Update recommendations as user watches

Metrics

Offline Metrics

Metric	Description
AUC-ROC	Binary classification quality
Recall@20	Titles user watches in top 20
NDCG	Ranking quality
Calibration	Predicted vs actual watch rates

Online Metrics

Metric	Description
Take Rate	Views / Impressions
Browse Time	Time before selecting content
Session Starts	Sessions that lead to watching
Engagement	Hours watched per session

Architecture

Loading diagram...

Feature Engineering

User Features

Feature	Type	Description
watch_history	Sequence	Last N titles watched
genre_preferences	Vector	Affinity for each genre
viewing_patterns	Embedding	When/how user watches
completion_rate	Numerical	Avg % of content finished
account_age	Numerical	Days since signup
profile_type	Categorical	Kids, adult, shared
language_preference	Categorical	Preferred audio/subtitle

Title Features

Feature	Type	Description
genres	Multi-hot	Associated genres
maturity_rating	Categorical	Age rating
release_date	Numerical	Days since release
runtime	Numerical	Length in minutes
content_type	Categorical	Movie, series, documentary
popularity_score	Numerical	Global/regional popularity
avg_completion	Numerical	Completion rate across users
visual_embedding	Vector	Thumbnail/trailer embedding
text_embedding	Vector	Synopsis embedding
cast_embedding	Vector	Actor/director embeddings

Interaction Features

Feature	Type	Description
similarity_to_watched	Numerical	Content similarity
genre_match	Numerical	Genre overlap with preferences
actor_affinity	Numerical	Has watched this actor before
sequel/series	Binary	Continuation of watched content
time_since_similar	Numerical	Recency of watching similar

Context Features

Feature	Type	Description
time_of_day	Categorical	Morning, afternoon, evening
day_of_week	Categorical	Weekday vs weekend
device	Categorical	TV, mobile, tablet
profile_active	Binary	Main profile or secondary

Model Architecture

Two-Stage Approach

Stage 1: Candidate Generation

Multiple candidate generators, each producing ~100-500 titles:

Source	Description	Typical Count
New releases	Recently released titles in user's preferred genres	~100
Similar to watched	Titles similar to user's last 10 watches	~500 (50 per title)
Collaborative filtering	Recommendations from similar users	~200
Trending	Popular titles in user's region	~100
Continue watching	In-progress content for the user	Variable

Combine all sources into a single candidate pool for ranking.

Stage 2: Ranking Model

Deep neural network to score all candidates:

Component	Architecture	Output
User encoder	Transformer for watch history (128-dim) + MLP for profile	256-dim user embedding
Title encoder	BERT for text + ResNet for visuals + MLP for metadata	256-dim title embedding
Interaction layer	Concatenate user + title + context, then Dense(256) -> ReLU -> Dropout -> Dense(128) -> ReLU -> Dense(1) -> Sigmoid	Watch probability (0-1)

The model encodes user and title separately, concatenates with context features, and predicts the probability of watching.

Training

Training Data

Label definition:

Positive (label=1): User watched the title
Negative (label=0): User was shown the title but did not watch

Sample construction:

For each user session, if the user watched a title, create a positive sample
For all impressions (titles shown) that were not watched, create negative samples
Include user ID, title ID, label, and session context for each sample

Loss Function

Binary cross-entropy with class weighting. Apply a positive weight (e.g., 5.0) to up-weight positive samples since watches are rarer than non-watches. This encourages the model to better learn the positive class.

Serving

Real-time Personalization

Homepage generation process:

Fetch user state: Retrieve user features and recent activity
Build rows by type:
- Continue Watching: In-progress content for the user
- Because You Watched X: For each of the last 3 watched titles, get and rank similar titles (top 20 per row)
- Trending Now: Trending titles ranked for this user (top 20)
- Top Picks for You: Personalized recommendations (top 20)
Return assembled homepage: List of row titles with their ranked content

Caching Strategy

Data	Cache TTL
User embeddings	Computed hourly
Title embeddings	Computed daily
Candidate pools	15 minutes per user
Final rankings	Computed per request

Monitoring

Key Metrics

Dashboard metrics:

Take rate by row type
Browse time percentiles
Prediction calibration
Feature drift

Alerts:

Alert	Condition	Severity
Take rate drop	Take rate falls below 90% of baseline	High

Reference

Topic	Description
Cold-start for new users	Start with popular content, use onboarding survey, learn from first interactions.
Promoting new content	Reserve exploration slots for new content. Track performance and adjust.
Shared accounts	Use profiles. Detect different viewing patterns within profiles.
Explicit ratings vs implicit signals	Implicit signals (actual watches) are stronger predictors than stated preferences.
Exploitation vs exploration trade-off	Balance safe recommendations with discovery of new content.
Personalization vs popularity	Combine individual preferences with social proof signals.

Requirements​

Metrics​

Offline Metrics​

Online Metrics​

Architecture​

Feature Engineering​

User Features​

Title Features​

Interaction Features​

Context Features​

Model Architecture​

Two-Stage Approach​

Training​

Training Data​

Loss Function​

Serving​

Real-time Personalization​

Caching Strategy​

Monitoring​

Key Metrics​

Reference​

In this article