Skip to main content

Introduction to ML System Design

ML system design interviews evaluate the ability to translate business problems into production ML systems. These interviews assess end-to-end thinking: from problem definition through data collection, model training, deployment, and monitoring.

ML systems differ from traditional software systems in several fundamental ways. Outputs are probabilistic rather than deterministic. System performance can degrade without explicit errors. The quality of results depends heavily on data quality and freshness.

System Overview

An ML system consists of interconnected pipelines:

Loading diagram...

ML vs Traditional System Design

AspectTraditional System DesignML System Design
OutputDeterministicProbabilistic
Data roleInput to processCore system component
TestingUnit tests, integration testsMetrics, A/B tests
Failure modesExplicit errorsSilent degradation
UpdatesCode deploymentModel retraining

Core Components

Data Pipeline

The data pipeline handles data collection, cleaning, transformation, and storage. Key considerations include:

  • Data sources and ingestion
  • Data validation and quality checks
  • Feature computation and storage
  • Access patterns for training and serving

Training Pipeline

The training pipeline manages model development and experimentation:

  • Model selection and architecture
  • Hyperparameter tuning
  • Distributed training for large datasets
  • Experiment tracking and reproducibility

Serving Pipeline

The serving pipeline delivers predictions to users:

  • Batch vs real-time inference
  • Feature retrieval latency
  • Model versioning and rollout
  • Fallback handling

Monitoring Pipeline

The monitoring pipeline tracks system health:

  • Model performance metrics
  • Data drift detection
  • System latency and throughput
  • Alerting and incident response

Common Problem Types

Ranking and Recommendation

Content ranking (news feeds, search results), product recommendations, ad ranking. These systems optimize engagement, relevance, or revenue metrics across large item catalogs.

Classification

Spam detection, fraud detection, content moderation. Binary or multi-class decisions at scale. Class imbalance is a common challenge in these systems.

Prediction

ETA estimation, demand forecasting, churn prediction, click-through rate estimation. Regression problems requiring accuracy across different data segments.

Computer Vision

Image classification, object detection, visual search. These problems involve CNNs, embeddings, and often have real-time inference requirements.

Natural Language Processing

Text classification, entity recognition, translation, question answering. Transformer architectures are standard, though simpler approaches may be appropriate depending on constraints.

Evaluation Criteria

Interviewers evaluate candidates on these dimensions:

  1. Problem framing: Ability to translate vague business requirements into concrete ML objectives
  2. Data understanding: Knowledge of data requirements, handling missing data, and addressing data quality issues
  3. Algorithm selection: Understanding of available approaches and appropriate selection based on constraints
  4. System design: Ability to architect systems for scale and reliability
  5. Measurement: Definition of offline and online metrics aligned with business goals
  6. Production thinking: Consideration of monitoring, maintenance, and failure scenarios

Interview Structure

A typical interview lasts 45-60 minutes:

PhaseTimeFocus
Requirements5-10 minScope, scale, constraints
Metrics5 minSuccess criteria, offline and online
Architecture10-15 minSystem components and data flow
Data and features10-15 minData sources, feature engineering
Model and training10-15 minAlgorithm selection, training approach
Serving and monitoring5-10 minDeployment strategy, observability

Strong Signals

  • Asking clarifying questions before proposing solutions
  • Selecting metrics aligned with business objectives
  • Starting with simple approaches, adding complexity when justified
  • Discussing trade-offs between alternatives
  • Considering post-launch operations

Areas to Avoid

  • Proposing complex models without understanding requirements
  • Assuming clean, labeled data is available
  • Omitting evaluation strategy
  • Focusing only on the model, ignoring serving and monitoring
  • Not acknowledging uncertainty when multiple approaches are valid