ML System Design Interview Rubric

This document describes the evaluation criteria and scoring methodology for ML system design interviews.

Evaluation Dimensions

1. Problem Understanding (15%)

Positive indicators:

Asks clarifying questions before designing
Identifies scale, latency, and accuracy requirements
States assumptions explicitly
Scopes the problem appropriately

Negative indicators:

Begins designing without gathering requirements
Misses obvious requirements
Makes unstated assumptions
Unable to proceed with ambiguous requirements

2. ML Problem Formulation (20%)

Positive indicators:

Correctly identifies problem type (classification, ranking, regression)
Metrics align with business objectives
Distinguishes between offline and online metrics
Understands that metrics are proxies for true goals

Negative indicators:

Misclassifies problem type (e.g., ranking as classification)
Selects accuracy with imbalanced classes
Ignores business objectives in metric selection
Does not address offline-online performance gap

3. Data and Feature Engineering (20%)

Positive indicators:

Identifies data sources and availability
Proposes predictive and computable features
Considers real-time vs batch feature computation
Acknowledges data quality issues (missing values, label noise, bias)

Negative indicators:

Assumes perfect data availability
Proposes features not computable in production
Does not ensure training features are available at serving time
Ignores privacy constraints or data bias

4. Model Selection and Training (20%)

Positive indicators:

Starts with simple baseline, adds complexity with justification
Justifies model choice with trade-off analysis
Discusses training details (data splits, class imbalance, hyperparameter tuning)
Considers accuracy vs interpretability trade-offs

Negative indicators:

Selects complex models without justification
Cannot explain model selection rationale
Does not address training methodology
No baseline comparison

5. System Design and Scalability (15%)

Positive indicators:

Designs complete system, not just the model
Addresses latency constraints
Considers scaling to higher traffic
Discusses failure handling

Negative indicators:

Only discusses model architecture
Designs infeasible real-time systems
No consideration for traffic growth
Assumes system never fails

6. Evaluation and Iteration (10%)

Positive indicators:

Clear evaluation strategy (offline tests, shadow mode, A/B testing)
Understands A/B test methodology and interpretation
Plans for post-launch monitoring
Considers retraining and continuous improvement

Negative indicators:

No validation methodology
Skips from training to production
No monitoring plan
Treats model as static after launch

Scoring Levels

Level	Description
Level 5: Strong Hire	Demonstrates capability to lead the project. Deep problem understanding, practical design, anticipates issues, clear communication. Production-ready system design.
Level 4: Hire	Covers all major components with reasonable decisions and clear explanations. Minor gaps addressable with normal onboarding.
Level 3: Lean Hire	Understands basics but lacks depth. Reasonable approach but shallow. Would require mentorship initially.
Level 2: Lean No Hire	Significant gaps. Missed major components, questionable decisions, or unclear communication. Not ready for this level without substantial growth.
Level 1: Strong No Hire	Did not demonstrate fundamentals. Could not complete basic design. Lacks core knowledge or application ability.

Signal Patterns

Strong Performance Indicators

Category	Behaviors
Problem Approach	Asks clarifying questions, starts simple, adds complexity with justification
Trade-off Analysis	Discusses alternatives, explains decision rationale
Production Thinking	Considers deployment, monitoring, failure scenarios from the start
Communication	Engages with problem, explains reasoning clearly

Weak Performance Indicators

Category	Behaviors
Problem Approach	Jumps to complex solutions, assumes perfect data
Trade-off Analysis	Presents single approach without alternatives
Production Thinking	Forgets model serving and monitoring requirements
Communication	Stops at "model trained" without addressing deployment

Preparation Recommendations

Before the Interview

Practice verbal explanations - Time-boxed design discussions
Study real systems - Engineering blogs from Netflix, Uber, Airbnb
Know algorithms - Understand when to use each, not just what they are
Prepare examples - Systems previously built or studied

During the Interview

Clarify first - Gather requirements before designing
Explain reasoning - Verbalize thought process
Use diagrams - Visual communication aids clarity
Discuss trade-offs - No perfect solutions exist
Stay practical - Consider latency and cost constraints

Rejection Patterns

Pattern	Issue
Unjustified complexity	"Deep learning is state of the art" without problem analysis
Data assumptions	"Collect more data" as solution to all problems
Latency ignorance	Does not consider inference time constraints
Offline-only thinking	Assumes offline performance translates to production
Cannot defend decisions	Unable to explain choices when questioned

Example Feedback

Strong Performance

"Asked clarifying questions about scale before designing. Started with logistic regression baseline and explained when gradient boosting upgrade would be warranted. Feature engineering grounded in available data. Discussed monitoring before prompted. Proactively addressed cold start problem."

Weak Performance

"Proposed transformer without understanding problem requirements. Could not estimate inference time when asked about latency. Did not mention feature engineering or data quality. Did not know how to evaluate model in production. Incomplete design at time limit."

Evaluation Dimensions​

1. Problem Understanding (15%)​

2. ML Problem Formulation (20%)​

3. Data and Feature Engineering (20%)​

4. Model Selection and Training (20%)​

5. System Design and Scalability (15%)​

6. Evaluation and Iteration (10%)​

Scoring Levels​

Signal Patterns​

Strong Performance Indicators​

Weak Performance Indicators​

Preparation Recommendations​

Before the Interview​

During the Interview​

Rejection Patterns​

Example Feedback​

Strong Performance​

Weak Performance​

Table of Contents

Evaluation Dimensions

1. Problem Understanding (15%)

2. ML Problem Formulation (20%)

3. Data and Feature Engineering (20%)

4. Model Selection and Training (20%)

5. System Design and Scalability (15%)

6. Evaluation and Iteration (10%)

Scoring Levels

Signal Patterns

Strong Performance Indicators

Weak Performance Indicators

Preparation Recommendations

Before the Interview

During the Interview

Rejection Patterns

Example Feedback

Strong Performance

Weak Performance