Skip to main content

Design a Recommendation System

Concepts tested: Collaborative filtering, content-based filtering, two-tower architecture, cold start problem, candidate generation, ranking models, offline/online evaluation

Problem Statement

Design a recommendation system for a content or e-commerce platform.

Clarification Questions

QuestionDesign Impact
What items are recommendedFeature types, update frequency
Available signalsExplicit (ratings) vs implicit (views, clicks)
ScaleScalability requirements
Latency requirementReal-time vs batch computation
Business objectiveEngagement, revenue, satisfaction

Core Problem

Recommendation systems solve one of two formulations:

ProblemDescription
Rating predictionPredict how much user will like item (score)
Top-K recommendationSelect best K items to show user (ranking)

These formulations differ: rating prediction requires accurate scores; Top-K requires correct ranking order.

User-Item Matrix

The recommendation problem can be represented as a sparse matrix where rows are users, columns are items, and cells contain ratings or interaction signals. Most cells are empty (no interaction). The goal is to predict these missing values to recommend items users would likely enjoy.

Recommendation Approaches

1. Collaborative Filtering

Principle: Users who agreed in the past will agree in the future.

TypeMethodAdvantagesLimitations
User-basedFind similar users, recommend their itemsIntuitivePoor scalability
Item-basedFind similar items to user's historyBetter scalabilitySparse matrix issues
Matrix FactorizationLearn latent factors for users and itemsHandles sparsity, scalableCold start problem

Matrix Factorization:

The predicted rating for user u and item i is approximated by the dot product of the user's latent vector and the item's latent vector. This decomposes the large user-item matrix into two smaller matrices: a User Matrix (M users x K latent factors) and an Item Matrix (K factors x N items).

2. Content-Based Filtering

Principle: Recommend items similar to what the user has liked, based on item attributes.

A user profile is constructed from the features of items the user has liked (e.g., average genre distribution, preferred attributes). Recommendations are items whose features are most similar to this user profile.

Domain-Specific Features:

DomainItem Features
MoviesGenre, director, actors, plot keywords, runtime
MusicGenre, tempo, artist, mood, instrumentation
E-commerceCategory, brand, price range, attributes
NewsTopic, entities, publication, recency

Similarity Metrics:

MethodUse Case
Cosine SimilarityDense embeddings, normalized vectors
TF-IDF + CosineText-based items
Euclidean DistanceNon-normalized numeric features
Jaccard SimilarityBinary features, tags

3. Hybrid Approaches

Combine multiple signals for improved performance.

The hybrid approach feeds signals from collaborative filtering, content-based filtering, and contextual signals into an ensemble or ranking model that learns optimal weights for each signal source. The model then produces final recommendations by combining these weighted inputs.

Ensemble Methods:

MethodDescription
Weighted averagescore = w1 * CF_score + w2 * CB_score
StackingTrain model to combine scores
SwitchingUse CF when possible, fallback to CB

Deep Learning Approaches

Neural Collaborative Filtering

Neural collaborative filtering passes user ID and item ID through separate embedding layers, concatenates the resulting vectors, then feeds them through hidden layers to produce a prediction. This learns non-linear interactions between user and item representations.

Two-Tower Architecture

Separate encoders for users and items enable efficient retrieval.

User Tower: Takes user features (demographics, history) through an encoder network to produce a user embedding (e.g., 128 dimensions).

Item Tower: Takes item features (attributes, content) through a separate encoder to produce an item embedding of the same dimensionality.

Similarity: The dot product between user and item embeddings measures compatibility.

Advantage: Item embeddings can be pre-computed and indexed for fast approximate nearest neighbor search. Only the user embedding needs to be computed at request time.

Advantages:

  • Item embeddings can be pre-computed
  • Enables approximate nearest neighbor search
  • User embedding computed at request time

Feature Engineering

User Features

CategoryExamples
DemographicsAge, gender, location, language
BehavioralWatch history, purchase history, browse patterns
AggregatedAverage rating given, genre distribution
TemporalRecency of activity, time-of-day patterns
SessionCurrent session items, search queries

Item Features

CategoryExamples
MetadataTitle, description, category, tags
ContentText embeddings, image embeddings
PopularityView count, purchase count, trending score
QualityAverage rating, review sentiment
TemporalRelease date, freshness

Interaction Features

FeatureDescription
User-item affinityPast interactions between user and similar items
Cross featuresUser_genre_preference * Item_genre
Position biasWhere item was shown when interacted
Time since interactionRecency signal

Cold Start Problem

No data available for new users or new items.

New User Solutions

StrategyMethod
Popular itemsShow globally popular items
Demographic-basedUse similar users based on demographics
Onboarding quizAsk preferences during signup
Content-basedRecommend based on first few interactions
Explore-exploitBalance exploration of preferences

New Item Solutions

StrategyMethod
Content-basedUse item attributes to find similar items
Boost explorationInject new items into recommendations
Metadata matchingMatch to user preferences based on attributes
Creator signalUse creator's past item performance

Evaluation Metrics

Offline Metrics

MetricMeasurementUse Case
RMSE/MAERating prediction errorExplicit feedback
Precision@KRelevant items in top KRanking quality
Recall@KCoverage of relevant itemsDiscovery
NDCG@KPosition-weighted ranking qualityWhen order matters
MAPAverage precision across usersOverall ranking
Hit Rate@KUsers with at least one relevant item in KBasic check

Example: Precision@K

If a user liked items A, B, C, D, E and the system recommended top 5 items A, X, B, Y, Z, then 2 of the recommended items (A and B) were relevant. Precision@5 = 2/5 = 0.4 (40%).

Online Metrics (A/B Testing)

MetricMeasurement
CTREngagement with recommendations
Conversion ratePurchases from recommendations
Watch/Listen timeEngagement depth
DiversityVariety in recommendations
NoveltyNon-obvious recommendations
User satisfactionSurvey, NPS

Metric Trade-offs

Trade-offDescription
Accuracy vs DiversityConfident recommendations vs variety
Exploitation vs ExplorationKnown preferences vs discovering new
Short-term vs Long-termImmediate clicks vs user retention

System Architecture

Two-Stage Architecture

Stage 1: Candidate Generation (Retrieval)

From millions of items, retrieve hundreds of candidates using fast, approximate methods. Multiple retrieval sources include collaborative filtering (ANN search on embeddings), content-based retrieval, and popularity/trending signals. Each source contributes around 500 candidates.

Stage 2: Ranking

From hundreds of candidates, produce a final ordered list using a more sophisticated ranking model. The model uses user features, item features, and cross-features to predict engagement metrics (click probability, conversion probability, expected watch time). Output is typically the top 20-50 items.

Stage 3: Re-ranking / Business Rules

Apply business logic: ensure diversity to avoid repetitive content, boost new releases for freshness, apply promotional rules, remove duplicates, and adjust for context.

Common Issues

IssueProblemSolution
Popularity biasOver-recommend popular itemsDiversification, explore-exploit
Filter bubblesUsers stuck in narrow preferencesSerendipity injection
Position biasTop items get more clicks regardlessPosition-aware training
Feedback loopsRecommendations create training dataExploration, randomization
Stale modelsPreferences change over timeRegular retraining, online learning

Summary

DecisionOptionsRecommendation
AlgorithmCF, Content, HybridHybrid (combine signals)
RetrievalFull scan, ANNANN for scale (HNSW, Faiss)
RankingHeuristics, MLML model (gradient boosting or neural)
Cold startPopular, Content-basedMulti-strategy (context-dependent)
EvaluationOffline only, A/BBoth (offline for development, A/B for decisions)
ArchitectureSingle model, Two-stageTwo-stage (retrieve then rank)