Design a Recommendation System
Concepts tested: Collaborative filtering, content-based filtering, two-tower architecture, cold start problem, candidate generation, ranking models, offline/online evaluation
Problem Statement
Design a recommendation system for a content or e-commerce platform.
Clarification Questions
| Question | Design Impact |
|---|---|
| What items are recommended | Feature types, update frequency |
| Available signals | Explicit (ratings) vs implicit (views, clicks) |
| Scale | Scalability requirements |
| Latency requirement | Real-time vs batch computation |
| Business objective | Engagement, revenue, satisfaction |
Core Problem
Recommendation systems solve one of two formulations:
| Problem | Description |
|---|---|
| Rating prediction | Predict how much user will like item (score) |
| Top-K recommendation | Select best K items to show user (ranking) |
These formulations differ: rating prediction requires accurate scores; Top-K requires correct ranking order.
User-Item Matrix
The recommendation problem can be represented as a sparse matrix where rows are users, columns are items, and cells contain ratings or interaction signals. Most cells are empty (no interaction). The goal is to predict these missing values to recommend items users would likely enjoy.
Recommendation Approaches
1. Collaborative Filtering
Principle: Users who agreed in the past will agree in the future.
| Type | Method | Advantages | Limitations |
|---|---|---|---|
| User-based | Find similar users, recommend their items | Intuitive | Poor scalability |
| Item-based | Find similar items to user's history | Better scalability | Sparse matrix issues |
| Matrix Factorization | Learn latent factors for users and items | Handles sparsity, scalable | Cold start problem |
Matrix Factorization:
The predicted rating for user u and item i is approximated by the dot product of the user's latent vector and the item's latent vector. This decomposes the large user-item matrix into two smaller matrices: a User Matrix (M users x K latent factors) and an Item Matrix (K factors x N items).
2. Content-Based Filtering
Principle: Recommend items similar to what the user has liked, based on item attributes.
A user profile is constructed from the features of items the user has liked (e.g., average genre distribution, preferred attributes). Recommendations are items whose features are most similar to this user profile.
Domain-Specific Features:
| Domain | Item Features |
|---|---|
| Movies | Genre, director, actors, plot keywords, runtime |
| Music | Genre, tempo, artist, mood, instrumentation |
| E-commerce | Category, brand, price range, attributes |
| News | Topic, entities, publication, recency |
Similarity Metrics:
| Method | Use Case |
|---|---|
| Cosine Similarity | Dense embeddings, normalized vectors |
| TF-IDF + Cosine | Text-based items |
| Euclidean Distance | Non-normalized numeric features |
| Jaccard Similarity | Binary features, tags |
3. Hybrid Approaches
Combine multiple signals for improved performance.
The hybrid approach feeds signals from collaborative filtering, content-based filtering, and contextual signals into an ensemble or ranking model that learns optimal weights for each signal source. The model then produces final recommendations by combining these weighted inputs.
Ensemble Methods:
| Method | Description |
|---|---|
| Weighted average | score = w1 * CF_score + w2 * CB_score |
| Stacking | Train model to combine scores |
| Switching | Use CF when possible, fallback to CB |
Deep Learning Approaches
Neural Collaborative Filtering
Neural collaborative filtering passes user ID and item ID through separate embedding layers, concatenates the resulting vectors, then feeds them through hidden layers to produce a prediction. This learns non-linear interactions between user and item representations.
Two-Tower Architecture
Separate encoders for users and items enable efficient retrieval.
User Tower: Takes user features (demographics, history) through an encoder network to produce a user embedding (e.g., 128 dimensions).
Item Tower: Takes item features (attributes, content) through a separate encoder to produce an item embedding of the same dimensionality.
Similarity: The dot product between user and item embeddings measures compatibility.
Advantage: Item embeddings can be pre-computed and indexed for fast approximate nearest neighbor search. Only the user embedding needs to be computed at request time.
Advantages:
- Item embeddings can be pre-computed
- Enables approximate nearest neighbor search
- User embedding computed at request time
Feature Engineering
User Features
| Category | Examples |
|---|---|
| Demographics | Age, gender, location, language |
| Behavioral | Watch history, purchase history, browse patterns |
| Aggregated | Average rating given, genre distribution |
| Temporal | Recency of activity, time-of-day patterns |
| Session | Current session items, search queries |
Item Features
| Category | Examples |
|---|---|
| Metadata | Title, description, category, tags |
| Content | Text embeddings, image embeddings |
| Popularity | View count, purchase count, trending score |
| Quality | Average rating, review sentiment |
| Temporal | Release date, freshness |
Interaction Features
| Feature | Description |
|---|---|
| User-item affinity | Past interactions between user and similar items |
| Cross features | User_genre_preference * Item_genre |
| Position bias | Where item was shown when interacted |
| Time since interaction | Recency signal |
Cold Start Problem
No data available for new users or new items.
New User Solutions
| Strategy | Method |
|---|---|
| Popular items | Show globally popular items |
| Demographic-based | Use similar users based on demographics |
| Onboarding quiz | Ask preferences during signup |
| Content-based | Recommend based on first few interactions |
| Explore-exploit | Balance exploration of preferences |
New Item Solutions
| Strategy | Method |
|---|---|
| Content-based | Use item attributes to find similar items |
| Boost exploration | Inject new items into recommendations |
| Metadata matching | Match to user preferences based on attributes |
| Creator signal | Use creator's past item performance |
Evaluation Metrics
Offline Metrics
| Metric | Measurement | Use Case |
|---|---|---|
| RMSE/MAE | Rating prediction error | Explicit feedback |
| Precision@K | Relevant items in top K | Ranking quality |
| Recall@K | Coverage of relevant items | Discovery |
| NDCG@K | Position-weighted ranking quality | When order matters |
| MAP | Average precision across users | Overall ranking |
| Hit Rate@K | Users with at least one relevant item in K | Basic check |
Example: Precision@K
If a user liked items A, B, C, D, E and the system recommended top 5 items A, X, B, Y, Z, then 2 of the recommended items (A and B) were relevant. Precision@5 = 2/5 = 0.4 (40%).
Online Metrics (A/B Testing)
| Metric | Measurement |
|---|---|
| CTR | Engagement with recommendations |
| Conversion rate | Purchases from recommendations |
| Watch/Listen time | Engagement depth |
| Diversity | Variety in recommendations |
| Novelty | Non-obvious recommendations |
| User satisfaction | Survey, NPS |
Metric Trade-offs
| Trade-off | Description |
|---|---|
| Accuracy vs Diversity | Confident recommendations vs variety |
| Exploitation vs Exploration | Known preferences vs discovering new |
| Short-term vs Long-term | Immediate clicks vs user retention |
System Architecture
Two-Stage Architecture
Stage 1: Candidate Generation (Retrieval)
From millions of items, retrieve hundreds of candidates using fast, approximate methods. Multiple retrieval sources include collaborative filtering (ANN search on embeddings), content-based retrieval, and popularity/trending signals. Each source contributes around 500 candidates.
Stage 2: Ranking
From hundreds of candidates, produce a final ordered list using a more sophisticated ranking model. The model uses user features, item features, and cross-features to predict engagement metrics (click probability, conversion probability, expected watch time). Output is typically the top 20-50 items.
Stage 3: Re-ranking / Business Rules
Apply business logic: ensure diversity to avoid repetitive content, boost new releases for freshness, apply promotional rules, remove duplicates, and adjust for context.
Common Issues
| Issue | Problem | Solution |
|---|---|---|
| Popularity bias | Over-recommend popular items | Diversification, explore-exploit |
| Filter bubbles | Users stuck in narrow preferences | Serendipity injection |
| Position bias | Top items get more clicks regardless | Position-aware training |
| Feedback loops | Recommendations create training data | Exploration, randomization |
| Stale models | Preferences change over time | Regular retraining, online learning |
Summary
| Decision | Options | Recommendation |
|---|---|---|
| Algorithm | CF, Content, Hybrid | Hybrid (combine signals) |
| Retrieval | Full scan, ANN | ANN for scale (HNSW, Faiss) |
| Ranking | Heuristics, ML | ML model (gradient boosting or neural) |
| Cold start | Popular, Content-based | Multi-strategy (context-dependent) |
| Evaluation | Offline only, A/B | Both (offline for development, A/B for decisions) |
| Architecture | Single model, Two-stage | Two-stage (retrieve then rank) |