Design a Recommendation System

Concepts tested: Collaborative filtering, content-based filtering, two-tower architecture, cold start problem, candidate generation, ranking models, offline/online evaluation

Problem Statement

Design a recommendation system for a content or e-commerce platform.

Clarification Questions

Question	Design Impact
What items are recommended	Feature types, update frequency
Available signals	Explicit (ratings) vs implicit (views, clicks)
Scale	Scalability requirements
Latency requirement	Real-time vs batch computation
Business objective	Engagement, revenue, satisfaction

Core Problem

Recommendation systems solve one of two formulations:

Problem	Description
Rating prediction	Predict how much user will like item (score)
Top-K recommendation	Select best K items to show user (ranking)

These formulations differ: rating prediction requires accurate scores; Top-K requires correct ranking order.

User-Item Matrix

The recommendation problem can be represented as a sparse matrix where rows are users, columns are items, and cells contain ratings or interaction signals. Most cells are empty (no interaction). The goal is to predict these missing values to recommend items users would likely enjoy.

Recommendation Approaches

1. Collaborative Filtering

Principle: Users who agreed in the past will agree in the future.

Type	Method	Advantages	Limitations
User-based	Find similar users, recommend their items	Intuitive	Poor scalability
Item-based	Find similar items to user's history	Better scalability	Sparse matrix issues
Matrix Factorization	Learn latent factors for users and items	Handles sparsity, scalable	Cold start problem

Matrix Factorization:

The predicted rating for user u and item i is approximated by the dot product of the user's latent vector and the item's latent vector. This decomposes the large user-item matrix into two smaller matrices: a User Matrix (M users x K latent factors) and an Item Matrix (K factors x N items).

2. Content-Based Filtering

Principle: Recommend items similar to what the user has liked, based on item attributes.

A user profile is constructed from the features of items the user has liked (e.g., average genre distribution, preferred attributes). Recommendations are items whose features are most similar to this user profile.

Domain-Specific Features:

Domain	Item Features
Movies	Genre, director, actors, plot keywords, runtime
Music	Genre, tempo, artist, mood, instrumentation
E-commerce	Category, brand, price range, attributes
News	Topic, entities, publication, recency

Similarity Metrics:

Method	Use Case
Cosine Similarity	Dense embeddings, normalized vectors
TF-IDF + Cosine	Text-based items
Euclidean Distance	Non-normalized numeric features
Jaccard Similarity	Binary features, tags

3. Hybrid Approaches

Combine multiple signals for improved performance.

The hybrid approach feeds signals from collaborative filtering, content-based filtering, and contextual signals into an ensemble or ranking model that learns optimal weights for each signal source. The model then produces final recommendations by combining these weighted inputs.

Ensemble Methods:

Method	Description
Weighted average	score = w1 * CF_score + w2 * CB_score
Stacking	Train model to combine scores
Switching	Use CF when possible, fallback to CB

Deep Learning Approaches

Neural Collaborative Filtering

Neural collaborative filtering passes user ID and item ID through separate embedding layers, concatenates the resulting vectors, then feeds them through hidden layers to produce a prediction. This learns non-linear interactions between user and item representations.

Two-Tower Architecture

Separate encoders for users and items enable efficient retrieval.

User Tower: Takes user features (demographics, history) through an encoder network to produce a user embedding (e.g., 128 dimensions).

Item Tower: Takes item features (attributes, content) through a separate encoder to produce an item embedding of the same dimensionality.

Similarity: The dot product between user and item embeddings measures compatibility.

Advantage: Item embeddings can be pre-computed and indexed for fast approximate nearest neighbor search. Only the user embedding needs to be computed at request time.

Advantages:

Item embeddings can be pre-computed
Enables approximate nearest neighbor search
User embedding computed at request time

Feature Engineering

User Features

Category	Examples
Demographics	Age, gender, location, language
Behavioral	Watch history, purchase history, browse patterns
Aggregated	Average rating given, genre distribution
Temporal	Recency of activity, time-of-day patterns
Session	Current session items, search queries

Item Features

Category	Examples
Metadata	Title, description, category, tags
Content	Text embeddings, image embeddings
Popularity	View count, purchase count, trending score
Quality	Average rating, review sentiment
Temporal	Release date, freshness

Interaction Features

Feature	Description
User-item affinity	Past interactions between user and similar items
Cross features	User_genre_preference * Item_genre
Position bias	Where item was shown when interacted
Time since interaction	Recency signal

Cold Start Problem

No data available for new users or new items.

New User Solutions

Strategy	Method
Popular items	Show globally popular items
Demographic-based	Use similar users based on demographics
Onboarding quiz	Ask preferences during signup
Content-based	Recommend based on first few interactions
Explore-exploit	Balance exploration of preferences

New Item Solutions

Strategy	Method
Content-based	Use item attributes to find similar items
Boost exploration	Inject new items into recommendations
Metadata matching	Match to user preferences based on attributes
Creator signal	Use creator's past item performance

Evaluation Metrics

Offline Metrics

Metric	Measurement	Use Case
RMSE/MAE	Rating prediction error	Explicit feedback
Precision@K	Relevant items in top K	Ranking quality
Recall@K	Coverage of relevant items	Discovery
NDCG@K	Position-weighted ranking quality	When order matters
MAP	Average precision across users	Overall ranking
Hit Rate@K	Users with at least one relevant item in K	Basic check

Example: Precision@K

If a user liked items A, B, C, D, E and the system recommended top 5 items A, X, B, Y, Z, then 2 of the recommended items (A and B) were relevant. Precision@5 = 2/5 = 0.4 (40%).

Online Metrics (A/B Testing)

Metric	Measurement
CTR	Engagement with recommendations
Conversion rate	Purchases from recommendations
Watch/Listen time	Engagement depth
Diversity	Variety in recommendations
Novelty	Non-obvious recommendations
User satisfaction	Survey, NPS

Metric Trade-offs

Trade-off	Description
Accuracy vs Diversity	Confident recommendations vs variety
Exploitation vs Exploration	Known preferences vs discovering new
Short-term vs Long-term	Immediate clicks vs user retention

System Architecture

Two-Stage Architecture

Stage 1: Candidate Generation (Retrieval)

From millions of items, retrieve hundreds of candidates using fast, approximate methods. Multiple retrieval sources include collaborative filtering (ANN search on embeddings), content-based retrieval, and popularity/trending signals. Each source contributes around 500 candidates.

Stage 2: Ranking

From hundreds of candidates, produce a final ordered list using a more sophisticated ranking model. The model uses user features, item features, and cross-features to predict engagement metrics (click probability, conversion probability, expected watch time). Output is typically the top 20-50 items.

Stage 3: Re-ranking / Business Rules

Apply business logic: ensure diversity to avoid repetitive content, boost new releases for freshness, apply promotional rules, remove duplicates, and adjust for context.

Common Issues

Issue	Problem	Solution
Popularity bias	Over-recommend popular items	Diversification, explore-exploit
Filter bubbles	Users stuck in narrow preferences	Serendipity injection
Position bias	Top items get more clicks regardless	Position-aware training
Feedback loops	Recommendations create training data	Exploration, randomization
Stale models	Preferences change over time	Regular retraining, online learning

Summary

Decision	Options	Recommendation
Algorithm	CF, Content, Hybrid	Hybrid (combine signals)
Retrieval	Full scan, ANN	ANN for scale (HNSW, Faiss)
Ranking	Heuristics, ML	ML model (gradient boosting or neural)
Cold start	Popular, Content-based	Multi-strategy (context-dependent)
Evaluation	Offline only, A/B	Both (offline for development, A/B for decisions)
Architecture	Single model, Two-stage	Two-stage (retrieve then rank)

Problem Statement​

Clarification Questions​

Core Problem​

User-Item Matrix​

Recommendation Approaches​

1. Collaborative Filtering​

2. Content-Based Filtering​

3. Hybrid Approaches​

Deep Learning Approaches​

Neural Collaborative Filtering​

Two-Tower Architecture​

Feature Engineering​

User Features​

Item Features​

Interaction Features​

Cold Start Problem​

New User Solutions​

New Item Solutions​

Evaluation Metrics​

Offline Metrics​

Online Metrics (A/B Testing)​

Metric Trade-offs​

System Architecture​

Two-Stage Architecture​

Common Issues​

Summary​

Table of Contents