Skip to main content

Design a Netflix Watch Prediction System

Design a system to predict what users are likely to watch next on Netflix.

Requirements

Functional:

  • Predict probability user will watch each title
  • Power personalized homepage rows
  • Enable relevant push notifications
  • Support "Continue Watching" and "Because you watched X"

Non-functional:

  • Score 10K+ titles per user
  • Homepage load < 500ms
  • 200M+ subscribers
  • Update recommendations as user watches

Metrics

Offline Metrics

MetricDescription
AUC-ROCBinary classification quality
Recall@20Titles user watches in top 20
NDCGRanking quality
CalibrationPredicted vs actual watch rates

Online Metrics

MetricDescription
Take RateViews / Impressions
Browse TimeTime before selecting content
Session StartsSessions that lead to watching
EngagementHours watched per session

Architecture

Loading diagram...

Feature Engineering

User Features

FeatureTypeDescription
watch_historySequenceLast N titles watched
genre_preferencesVectorAffinity for each genre
viewing_patternsEmbeddingWhen/how user watches
completion_rateNumericalAvg % of content finished
account_ageNumericalDays since signup
profile_typeCategoricalKids, adult, shared
language_preferenceCategoricalPreferred audio/subtitle

Title Features

FeatureTypeDescription
genresMulti-hotAssociated genres
maturity_ratingCategoricalAge rating
release_dateNumericalDays since release
runtimeNumericalLength in minutes
content_typeCategoricalMovie, series, documentary
popularity_scoreNumericalGlobal/regional popularity
avg_completionNumericalCompletion rate across users
visual_embeddingVectorThumbnail/trailer embedding
text_embeddingVectorSynopsis embedding
cast_embeddingVectorActor/director embeddings

Interaction Features

FeatureTypeDescription
similarity_to_watchedNumericalContent similarity
genre_matchNumericalGenre overlap with preferences
actor_affinityNumericalHas watched this actor before
sequel/seriesBinaryContinuation of watched content
time_since_similarNumericalRecency of watching similar

Context Features

FeatureTypeDescription
time_of_dayCategoricalMorning, afternoon, evening
day_of_weekCategoricalWeekday vs weekend
deviceCategoricalTV, mobile, tablet
profile_activeBinaryMain profile or secondary

Model Architecture

Two-Stage Approach

Stage 1: Candidate Generation

Multiple candidate generators, each producing ~100-500 titles:

SourceDescriptionTypical Count
New releasesRecently released titles in user's preferred genres~100
Similar to watchedTitles similar to user's last 10 watches~500 (50 per title)
Collaborative filteringRecommendations from similar users~200
TrendingPopular titles in user's region~100
Continue watchingIn-progress content for the userVariable

Combine all sources into a single candidate pool for ranking.

Stage 2: Ranking Model

Deep neural network to score all candidates:

ComponentArchitectureOutput
User encoderTransformer for watch history (128-dim) + MLP for profile256-dim user embedding
Title encoderBERT for text + ResNet for visuals + MLP for metadata256-dim title embedding
Interaction layerConcatenate user + title + context, then Dense(256) -> ReLU -> Dropout -> Dense(128) -> ReLU -> Dense(1) -> SigmoidWatch probability (0-1)

The model encodes user and title separately, concatenates with context features, and predicts the probability of watching.

Training

Training Data

Label definition:

  • Positive (label=1): User watched the title
  • Negative (label=0): User was shown the title but did not watch

Sample construction:

  1. For each user session, if the user watched a title, create a positive sample
  2. For all impressions (titles shown) that were not watched, create negative samples
  3. Include user ID, title ID, label, and session context for each sample

Loss Function

Binary cross-entropy with class weighting. Apply a positive weight (e.g., 5.0) to up-weight positive samples since watches are rarer than non-watches. This encourages the model to better learn the positive class.

Serving

Real-time Personalization

Homepage generation process:

  1. Fetch user state: Retrieve user features and recent activity
  2. Build rows by type:
    • Continue Watching: In-progress content for the user
    • Because You Watched X: For each of the last 3 watched titles, get and rank similar titles (top 20 per row)
    • Trending Now: Trending titles ranked for this user (top 20)
    • Top Picks for You: Personalized recommendations (top 20)
  3. Return assembled homepage: List of row titles with their ranked content

Caching Strategy

DataCache TTL
User embeddingsComputed hourly
Title embeddingsComputed daily
Candidate pools15 minutes per user
Final rankingsComputed per request

Monitoring

Key Metrics

Dashboard metrics:

  • Take rate by row type
  • Browse time percentiles
  • Prediction calibration
  • Feature drift

Alerts:

AlertConditionSeverity
Take rate dropTake rate falls below 90% of baselineHigh

Reference

TopicDescription
Cold-start for new usersStart with popular content, use onboarding survey, learn from first interactions.
Promoting new contentReserve exploration slots for new content. Track performance and adjust.
Shared accountsUse profiles. Detect different viewing patterns within profiles.
Explicit ratings vs implicit signalsImplicit signals (actual watches) are stronger predictors than stated preferences.
Exploitation vs exploration trade-offBalance safe recommendations with discovery of new content.
Personalization vs popularityCombine individual preferences with social proof signals.