ML System Design Interview Rubric
This document describes the evaluation criteria and scoring methodology for ML system design interviews.
Evaluation Dimensions
1. Problem Understanding (15%)
Positive indicators:
- Asks clarifying questions before designing
- Identifies scale, latency, and accuracy requirements
- States assumptions explicitly
- Scopes the problem appropriately
Negative indicators:
- Begins designing without gathering requirements
- Misses obvious requirements
- Makes unstated assumptions
- Unable to proceed with ambiguous requirements
2. ML Problem Formulation (20%)
Positive indicators:
- Correctly identifies problem type (classification, ranking, regression)
- Metrics align with business objectives
- Distinguishes between offline and online metrics
- Understands that metrics are proxies for true goals
Negative indicators:
- Misclassifies problem type (e.g., ranking as classification)
- Selects accuracy with imbalanced classes
- Ignores business objectives in metric selection
- Does not address offline-online performance gap
3. Data and Feature Engineering (20%)
Positive indicators:
- Identifies data sources and availability
- Proposes predictive and computable features
- Considers real-time vs batch feature computation
- Acknowledges data quality issues (missing values, label noise, bias)
Negative indicators:
- Assumes perfect data availability
- Proposes features not computable in production
- Does not ensure training features are available at serving time
- Ignores privacy constraints or data bias
4. Model Selection and Training (20%)
Positive indicators:
- Starts with simple baseline, adds complexity with justification
- Justifies model choice with trade-off analysis
- Discusses training details (data splits, class imbalance, hyperparameter tuning)
- Considers accuracy vs interpretability trade-offs
Negative indicators:
- Selects complex models without justification
- Cannot explain model selection rationale
- Does not address training methodology
- No baseline comparison
5. System Design and Scalability (15%)
Positive indicators:
- Designs complete system, not just the model
- Addresses latency constraints
- Considers scaling to higher traffic
- Discusses failure handling
Negative indicators:
- Only discusses model architecture
- Designs infeasible real-time systems
- No consideration for traffic growth
- Assumes system never fails
6. Evaluation and Iteration (10%)
Positive indicators:
- Clear evaluation strategy (offline tests, shadow mode, A/B testing)
- Understands A/B test methodology and interpretation
- Plans for post-launch monitoring
- Considers retraining and continuous improvement
Negative indicators:
- No validation methodology
- Skips from training to production
- No monitoring plan
- Treats model as static after launch
Scoring Levels
| Level | Description |
|---|---|
| Level 5: Strong Hire | Demonstrates capability to lead the project. Deep problem understanding, practical design, anticipates issues, clear communication. Production-ready system design. |
| Level 4: Hire | Covers all major components with reasonable decisions and clear explanations. Minor gaps addressable with normal onboarding. |
| Level 3: Lean Hire | Understands basics but lacks depth. Reasonable approach but shallow. Would require mentorship initially. |
| Level 2: Lean No Hire | Significant gaps. Missed major components, questionable decisions, or unclear communication. Not ready for this level without substantial growth. |
| Level 1: Strong No Hire | Did not demonstrate fundamentals. Could not complete basic design. Lacks core knowledge or application ability. |
Signal Patterns
Strong Performance Indicators
| Category | Behaviors |
|---|---|
| Problem Approach | Asks clarifying questions, starts simple, adds complexity with justification |
| Trade-off Analysis | Discusses alternatives, explains decision rationale |
| Production Thinking | Considers deployment, monitoring, failure scenarios from the start |
| Communication | Engages with problem, explains reasoning clearly |
Weak Performance Indicators
| Category | Behaviors |
|---|---|
| Problem Approach | Jumps to complex solutions, assumes perfect data |
| Trade-off Analysis | Presents single approach without alternatives |
| Production Thinking | Forgets model serving and monitoring requirements |
| Communication | Stops at "model trained" without addressing deployment |
Preparation Recommendations
Before the Interview
- Practice verbal explanations - Time-boxed design discussions
- Study real systems - Engineering blogs from Netflix, Uber, Airbnb
- Know algorithms - Understand when to use each, not just what they are
- Prepare examples - Systems previously built or studied
During the Interview
- Clarify first - Gather requirements before designing
- Explain reasoning - Verbalize thought process
- Use diagrams - Visual communication aids clarity
- Discuss trade-offs - No perfect solutions exist
- Stay practical - Consider latency and cost constraints
Rejection Patterns
| Pattern | Issue |
|---|---|
| Unjustified complexity | "Deep learning is state of the art" without problem analysis |
| Data assumptions | "Collect more data" as solution to all problems |
| Latency ignorance | Does not consider inference time constraints |
| Offline-only thinking | Assumes offline performance translates to production |
| Cannot defend decisions | Unable to explain choices when questioned |
Example Feedback
Strong Performance
"Asked clarifying questions about scale before designing. Started with logistic regression baseline and explained when gradient boosting upgrade would be warranted. Feature engineering grounded in available data. Discussed monitoring before prompted. Proactively addressed cold start problem."
Weak Performance
"Proposed transformer without understanding problem requirements. Could not estimate inference time when asked about latency. Did not mention feature engineering or data quality. Did not know how to evaluate model in production. Incomplete design at time limit."