Introduction to ML System Design
ML system design interviews evaluate the ability to translate business problems into production ML systems. These interviews assess end-to-end thinking: from problem definition through data collection, model training, deployment, and monitoring.
ML systems differ from traditional software systems in several fundamental ways. Outputs are probabilistic rather than deterministic. System performance can degrade without explicit errors. The quality of results depends heavily on data quality and freshness.
System Overview
An ML system consists of interconnected pipelines:
ML vs Traditional System Design
| Aspect | Traditional System Design | ML System Design |
|---|---|---|
| Output | Deterministic | Probabilistic |
| Data role | Input to process | Core system component |
| Testing | Unit tests, integration tests | Metrics, A/B tests |
| Failure modes | Explicit errors | Silent degradation |
| Updates | Code deployment | Model retraining |
Core Components
Data Pipeline
The data pipeline handles data collection, cleaning, transformation, and storage. Key considerations include:
- Data sources and ingestion
- Data validation and quality checks
- Feature computation and storage
- Access patterns for training and serving
Training Pipeline
The training pipeline manages model development and experimentation:
- Model selection and architecture
- Hyperparameter tuning
- Distributed training for large datasets
- Experiment tracking and reproducibility
Serving Pipeline
The serving pipeline delivers predictions to users:
- Batch vs real-time inference
- Feature retrieval latency
- Model versioning and rollout
- Fallback handling
Monitoring Pipeline
The monitoring pipeline tracks system health:
- Model performance metrics
- Data drift detection
- System latency and throughput
- Alerting and incident response
Common Problem Types
Ranking and Recommendation
Content ranking (news feeds, search results), product recommendations, ad ranking. These systems optimize engagement, relevance, or revenue metrics across large item catalogs.
Classification
Spam detection, fraud detection, content moderation. Binary or multi-class decisions at scale. Class imbalance is a common challenge in these systems.
Prediction
ETA estimation, demand forecasting, churn prediction, click-through rate estimation. Regression problems requiring accuracy across different data segments.
Computer Vision
Image classification, object detection, visual search. These problems involve CNNs, embeddings, and often have real-time inference requirements.
Natural Language Processing
Text classification, entity recognition, translation, question answering. Transformer architectures are standard, though simpler approaches may be appropriate depending on constraints.
Evaluation Criteria
Interviewers evaluate candidates on these dimensions:
- Problem framing: Ability to translate vague business requirements into concrete ML objectives
- Data understanding: Knowledge of data requirements, handling missing data, and addressing data quality issues
- Algorithm selection: Understanding of available approaches and appropriate selection based on constraints
- System design: Ability to architect systems for scale and reliability
- Measurement: Definition of offline and online metrics aligned with business goals
- Production thinking: Consideration of monitoring, maintenance, and failure scenarios
Interview Structure
A typical interview lasts 45-60 minutes:
| Phase | Time | Focus |
|---|---|---|
| Requirements | 5-10 min | Scope, scale, constraints |
| Metrics | 5 min | Success criteria, offline and online |
| Architecture | 10-15 min | System components and data flow |
| Data and features | 10-15 min | Data sources, feature engineering |
| Model and training | 10-15 min | Algorithm selection, training approach |
| Serving and monitoring | 5-10 min | Deployment strategy, observability |
Strong Signals
- Asking clarifying questions before proposing solutions
- Selecting metrics aligned with business objectives
- Starting with simple approaches, adding complexity when justified
- Discussing trade-offs between alternatives
- Considering post-launch operations
Areas to Avoid
- Proposing complex models without understanding requirements
- Assuming clean, labeled data is available
- Omitting evaluation strategy
- Focusing only on the model, ignoring serving and monitoring
- Not acknowledging uncertainty when multiple approaches are valid