Design a Spotify Recommendation System
Design a machine learning system to recommend songs and playlists on Spotify.
Requirements
Functional:
- Recommend songs for "Discover Weekly" playlist
- Recommend songs for radio stations
- Real-time "next song" recommendations
- Personalized home page recommendations
Non-functional:
- Generate Discover Weekly for 400M+ users weekly
- Real-time recommendations < 100ms
- Handle catalog of 80M+ songs
Metrics
Offline Metrics
| Metric | Description |
|---|---|
| Recall@K | Fraction of user's future listens in top K |
| NDCG | Ranking quality of recommendations |
| Coverage | Percentage of catalog recommended |
| Diversity | Variety in recommendations |
Online Metrics
| Metric | Description |
|---|---|
| Stream Rate | Songs played / Songs recommended |
| Skip Rate | Songs skipped within 30 seconds |
| Save Rate | Songs added to library |
| Listen Time | Total listening duration |
Business Metrics
- Monthly Active Users (MAU)
- Premium conversion rate
- Session duration
- Artist discovery rate
Architecture
Candidate Generation
Approach 1: Collaborative Filtering
Users who like similar songs will like similar new songs.
Matrix Factorization: Decompose user-song interaction matrix R into user factors (U) and song factors (V).
To recommend for user i:
- Get user embedding U[i] (128-dim vector)
- Compute dot product with all song embeddings V
- Return top-N highest scoring songs
Approach 2: Content-Based Filtering
Recommend songs similar to what user already likes.
Audio Features:
- Tempo, key, loudness
- Danceability, energy, valence
- Audio embeddings from neural networks
Metadata Features:
- Genre, artist, album
- Release year
- Lyrics embeddings
Approach 3: Two-Tower Model
Neural network approach for candidate retrieval.
Feature Engineering
User Features
| Feature | Description |
|---|---|
| listening_history | Sequence of recent tracks |
| top_artists | Most listened artists |
| top_genres | Most listened genres |
| listening_time_distribution | When they listen |
| skip_rate | How often they skip |
| playlist_creation_behavior | Playlists they've made |
| premium_status | Subscription tier |
Song Features
| Feature | Description |
|---|---|
| audio_features | Spotify's audio analysis (tempo, energy, etc.) |
| audio_embedding | Neural network embedding of audio |
| artist_embedding | Artist representation |
| genre_embedding | Genre representation |
| popularity | Global and regional popularity |
| release_date | When released |
| lyrics_embedding | NLP embedding of lyrics |
Context Features
| Feature | Description |
|---|---|
| time_of_day | Morning, afternoon, evening, night |
| day_of_week | Weekday vs weekend |
| device_type | Phone, desktop, smart speaker |
| activity_context | Workout, focus, party (if available) |
Model Architecture
Discover Weekly Pipeline
Weekly batch job to generate personalized playlists.
| Step | Description |
|---|---|
| 1 | Candidate Generation: 1000 songs from CF neighbors, content-based similar songs, trending in user's genres |
| 2 | Ranking Model: Predict P(stream), P(save), P(skip). Combine into final score. |
| 3 | Diversity Optimization: Ensure artist diversity, mix familiar and new, genre balance |
| 4 | Quality Filters: Remove explicit if preference set, remove recently played, remove disliked artists |
Real-time Radio Recommendations
Training
Data Collection
| Signal Type | Events | Interpretation |
|---|---|---|
| Positive | Stream complete (over 30s) | User engaged with song |
| Save to library | Strong preference | |
| Add to playlist | Curated preference | |
| Repeat listen | Very strong signal | |
| Negative | Skip early (under 30s) | User didn't like |
| Remove from playlist | Changed preference | |
| Hide song | Explicit dislike |
Loss Function
Multi-task learning with weighted losses:
| Task | Weight | Rationale |
|---|---|---|
| Stream prediction | w1 | Primary engagement signal |
| Save prediction | w2 | Strong preference indicator |
| Skip prediction | w3 | Negative signal |
| Playlist add prediction | w4 | Curated preference |
Total Loss = w1 x stream_loss + w2 x save_loss + w3 x skip_loss + w4 x playlist_loss
Handling Implicit Feedback
Listening data is implicit - not playing doesn't mean dislike.
Solutions:
- Weighted matrix factorization
- Bayesian personalized ranking (BPR)
- Negative sampling strategies
Cold Start Problem
New Users
| Approach | Description |
|---|---|
| Onboarding survey | Ask for favorite artists/genres |
| Popular items | Recommend globally popular songs |
| Demographic similarity | Use age, location-based recommendations |
| Quick learning | Rapidly update from first few interactions |
New Songs
| Approach | Description |
|---|---|
| Content-based | Use audio features and metadata |
| Artist fans | Recommend to fans of the artist |
| Editorial playlists | Human curation for initial exposure |
| Exploration budget | Allocate slots for new content |
Serving Infrastructure
Embedding Index
Use Approximate Nearest Neighbor (ANN) search for fast similarity lookup:
| Component | Configuration | Purpose |
|---|---|---|
| Index Type | Annoy / HNSW / Faiss | Trade-off: build time vs. query speed |
| Embedding Dim | 128 | Balance expressiveness vs. storage |
| Distance Metric | Angular (cosine) | Normalized similarity |
| Index Trees | 100 | More trees = better recall, slower |
| Query K | 100-200 | Candidates for ranking stage |
Query Flow:
- User embedding -> ANN index query
- Return top-K most similar song embeddings
- Latency target: under 10ms for 100 results from 50M songs
Caching Strategy
| Data | Cache TTL | Update Frequency |
|---|---|---|
| User Embedding | 1 hour | Re-compute on significant activity |
| Song Embeddings | Permanent | Updated daily batch job |
| Candidate Pool | 15 minutes | Per-user, invalidated on context change |
| Ranking Scores | 5 minutes | Short-lived, context-dependent |
Monitoring
Quality Metrics
| Metric | Formula | Target |
|---|---|---|
| Stream Rate | Streams / Impressions | Over 60% |
| Skip Rate | Early Skips / Streams | Under 25% |
| Save Rate | Saves / Streams | Over 5% |
| Discovery Rate | New Artist Streams / Total | 15-30% |
| Diversity Score | Unique Artists / Total Streams | Over 40% |
A/B Testing
- Test new models against production
- Segment by user type (new, casual, power users)
- Monitor for novelty effects
Reference
| Topic | Description |
|---|---|
| Exploitation vs exploration | Playing safe favorites keeps satisfaction stable. Discovery keeps the experience fresh. |
| Personalization depth | Too personalized feels stale. Too random feels irrelevant. |
| Computation trade-off | Real-time is fresher but more expensive. Batch is efficient but stale. |
| Popularity bias | Popular songs are safe bets. Long-tail discovery differentiates the platform. |