Skip to main content

ML System Design Interview Rubric

This document describes the evaluation criteria and scoring methodology for ML system design interviews.

Evaluation Dimensions

1. Problem Understanding (15%)

Positive indicators:

  • Asks clarifying questions before designing
  • Identifies scale, latency, and accuracy requirements
  • States assumptions explicitly
  • Scopes the problem appropriately

Negative indicators:

  • Begins designing without gathering requirements
  • Misses obvious requirements
  • Makes unstated assumptions
  • Unable to proceed with ambiguous requirements

2. ML Problem Formulation (20%)

Positive indicators:

  • Correctly identifies problem type (classification, ranking, regression)
  • Metrics align with business objectives
  • Distinguishes between offline and online metrics
  • Understands that metrics are proxies for true goals

Negative indicators:

  • Misclassifies problem type (e.g., ranking as classification)
  • Selects accuracy with imbalanced classes
  • Ignores business objectives in metric selection
  • Does not address offline-online performance gap

3. Data and Feature Engineering (20%)

Positive indicators:

  • Identifies data sources and availability
  • Proposes predictive and computable features
  • Considers real-time vs batch feature computation
  • Acknowledges data quality issues (missing values, label noise, bias)

Negative indicators:

  • Assumes perfect data availability
  • Proposes features not computable in production
  • Does not ensure training features are available at serving time
  • Ignores privacy constraints or data bias

4. Model Selection and Training (20%)

Positive indicators:

  • Starts with simple baseline, adds complexity with justification
  • Justifies model choice with trade-off analysis
  • Discusses training details (data splits, class imbalance, hyperparameter tuning)
  • Considers accuracy vs interpretability trade-offs

Negative indicators:

  • Selects complex models without justification
  • Cannot explain model selection rationale
  • Does not address training methodology
  • No baseline comparison

5. System Design and Scalability (15%)

Positive indicators:

  • Designs complete system, not just the model
  • Addresses latency constraints
  • Considers scaling to higher traffic
  • Discusses failure handling

Negative indicators:

  • Only discusses model architecture
  • Designs infeasible real-time systems
  • No consideration for traffic growth
  • Assumes system never fails

6. Evaluation and Iteration (10%)

Positive indicators:

  • Clear evaluation strategy (offline tests, shadow mode, A/B testing)
  • Understands A/B test methodology and interpretation
  • Plans for post-launch monitoring
  • Considers retraining and continuous improvement

Negative indicators:

  • No validation methodology
  • Skips from training to production
  • No monitoring plan
  • Treats model as static after launch

Scoring Levels

LevelDescription
Level 5: Strong HireDemonstrates capability to lead the project. Deep problem understanding, practical design, anticipates issues, clear communication. Production-ready system design.
Level 4: HireCovers all major components with reasonable decisions and clear explanations. Minor gaps addressable with normal onboarding.
Level 3: Lean HireUnderstands basics but lacks depth. Reasonable approach but shallow. Would require mentorship initially.
Level 2: Lean No HireSignificant gaps. Missed major components, questionable decisions, or unclear communication. Not ready for this level without substantial growth.
Level 1: Strong No HireDid not demonstrate fundamentals. Could not complete basic design. Lacks core knowledge or application ability.

Signal Patterns

Strong Performance Indicators

CategoryBehaviors
Problem ApproachAsks clarifying questions, starts simple, adds complexity with justification
Trade-off AnalysisDiscusses alternatives, explains decision rationale
Production ThinkingConsiders deployment, monitoring, failure scenarios from the start
CommunicationEngages with problem, explains reasoning clearly

Weak Performance Indicators

CategoryBehaviors
Problem ApproachJumps to complex solutions, assumes perfect data
Trade-off AnalysisPresents single approach without alternatives
Production ThinkingForgets model serving and monitoring requirements
CommunicationStops at "model trained" without addressing deployment

Preparation Recommendations

Before the Interview

  1. Practice verbal explanations - Time-boxed design discussions
  2. Study real systems - Engineering blogs from Netflix, Uber, Airbnb
  3. Know algorithms - Understand when to use each, not just what they are
  4. Prepare examples - Systems previously built or studied

During the Interview

  1. Clarify first - Gather requirements before designing
  2. Explain reasoning - Verbalize thought process
  3. Use diagrams - Visual communication aids clarity
  4. Discuss trade-offs - No perfect solutions exist
  5. Stay practical - Consider latency and cost constraints

Rejection Patterns

PatternIssue
Unjustified complexity"Deep learning is state of the art" without problem analysis
Data assumptions"Collect more data" as solution to all problems
Latency ignoranceDoes not consider inference time constraints
Offline-only thinkingAssumes offline performance translates to production
Cannot defend decisionsUnable to explain choices when questioned

Example Feedback

Strong Performance

"Asked clarifying questions about scale before designing. Started with logistic regression baseline and explained when gradient boosting upgrade would be warranted. Feature engineering grounded in available data. Discussed monitoring before prompted. Proactively addressed cold start problem."

Weak Performance

"Proposed transformer without understanding problem requirements. Could not estimate inference time when asked about latency. Did not mention feature engineering or data quality. Did not know how to evaluate model in production. Incomplete design at time limit."