Skip to main content

Design an Ad Click Prediction System

Design a machine learning system to predict whether a user will click on an ad (the core of online advertising systems at Google, Meta, and Amazon).

Requirements

Functional:

  • Predict P(click) for user-ad-context combinations
  • Support millions of ads
  • Handle cold-start (new ads, new users)
  • Update model with fresh data frequently

Non-functional:

  • Latency < 10ms at p99
  • Handle 1M+ predictions per second
  • Model updates within hours of new data
  • High calibration (predicted P(click) matches actual rate)

Metrics

Model Metrics

MetricDescriptionTarget
AUC-ROCRanking quality> 0.75
Log LossCalibration quality< 0.4
Normalized EntropyImprovement over baselineNE < 0.8
Calibration Errorpredicted - actual

Business Metrics

MetricDescription
Revenue per 1000 impressions (RPM)Direct business impact
Click-through rate (CTR)User engagement
Conversion ratePost-click actions
Advertiser ROIAre advertisers getting value?

Calibration is critical. If the model predicts 5% CTR, roughly 5% of those impressions should actually be clicked. Miscalibration breaks the auction economics.

Architecture

Loading diagram...

Feature Engineering

Feature engineering is where CTR models differentiate. Ad CTR models often have thousands of features.

User Features

FeatureTypeDescription
User ID embeddingEmbeddingLearned user representation
DemographicsCategoricalAge bucket, gender, income
InterestsMulti-hotInterest categories from behavior
Historical CTRNumericalUser's past click rate
RecencyNumericalDays since last visit
Session depthNumericalPages viewed this session
DeviceCategoricalMobile, desktop, tablet

Ad Features

FeatureTypeDescription
Ad ID embeddingEmbeddingLearned ad representation
Advertiser IDCategoricalBrand/advertiser
Creative typeCategoricalImage, video, text
Ad categoryCategoricalProduct category
Historical CTRNumericalAd's past click rate
Ad ageNumericalDays since ad created
Landing page qualityNumericalPage load time, relevance

Context Features

FeatureTypeDescription
Page categoryCategoricalNews, shopping, social
Ad positionNumericalSlot position on page
Time of dayCyclicalHour, encoded as sin/cos
Day of weekCategoricalWeekend vs weekday
Device featuresCategoricalOS, browser, screen size
GeoCategoricalCountry, region, city

Cross Features

Cross features capture interactions between feature categories:

FeatureTypeDescription
User x Ad categoryEmbeddingUser preference for ad categories
User x AdvertiserEmbeddingUser-brand affinity
Device x Ad formatInteractionDoes this ad work on mobile?
Hour x Ad categoryInteractionTime-sensitive categories
User x PositionNumericalUser's position sensitivity

Model Architecture

Wide & Deep Model

Loading diagram...

Wide: Memorizes specific combinations ("users who clicked this ad before")

Deep: Generalizes to new combinations ("users similar to past clickers")

Feature Interaction Models

ModelInteraction HandlingProsCons
Logistic RegressionManual feature crossesFast, interpretableManual effort
Wide & DeepWide for memorization, deep for generalizationGood balanceLarge model
DeepFMFM layer for 2nd order interactionsAuto feature crossesLimited to 2nd order
DCNCross network for explicit interactionsHigher-order crossesMore parameters
DINAttention over user historyHistory-awareComplex serving

Recommendation: Start with Wide & Deep or DCN. Add attention mechanisms (DIN) if user history is rich.

Embedding Design

Most features are categorical with high cardinality:

FeatureCardinalityEmbedding Dim
User ID1 billion64
Ad ID10 million32
Advertiser ID1 million16
Page ID10 million16
Category10,0008

Hashing trick: For extreme cardinality (user IDs), use feature hashing to fixed buckets. Accept some collisions.

Training

Sample Selection

Not all impressions are equal for training:

Loading diagram...

Negative downsampling: Clicks are rare (1-2%). Downsample negatives to balance classes. Apply importance weighting to correct for this.

Training Pipeline

Loading diagram...

Handling Data Freshness

Ad data has short shelf life. Yesterday's training data is already stale.

ApproachFreshnessTrade-off
Batch training (daily)24 hoursSimple but stale
Incremental updatesHoursBetter freshness
Online learningMinutesFreshest but complex

Recommendation: Batch training as baseline, incremental updates every few hours, online learning for time-sensitive features.

Serving

Latency Budget

Total: 10ms for CTR prediction (within larger 100ms auction)

StageBudget
Feature lookup2ms
Feature engineering2ms
Model inference4ms
Overhead2ms

Inference Optimization

Loading diagram...

Key optimizations:

  • Pre-compute and cache user features
  • Batch multiple ad predictions together
  • Quantize model to INT8
  • Prune low-value features
  • Use specialized inference frameworks (TensorRT, ONNX)

Feature Store

Features must be consistent between training and serving:

Loading diagram...

Store pre-computed features. Serve the exact same features used in training.

Calibration

Calibration matters more than accuracy for ads.

Calibration Techniques

TechniqueDescription
Platt scalingFit logistic regression on validation set
Isotonic regressionNon-parametric calibration
Temperature scalingDivide logits by learned temperature
Histogram binningMap predictions to observed rates

Calibration Monitoring

Loading diagram...

If predicted 2% CTR actually clicks at 4%, the model is under-predicting. Advertisers overpay.

Feedback Loop and Cold Start

New Ad Cold Start

New ads have no click history. Solutions:

ApproachDescription
Content-basedUse ad text/image features
Explore-exploitShow to small traffic to gather data
Similar ad featuresTransfer from similar ads
Advertiser priorUse advertiser's historical CTR

Feedback Loop Risks

Loading diagram...

Ads predicted to have low CTR get shown less, confirming the prediction. Break this with:

  • Exploration budget (show random ads sometimes)
  • Upper confidence bound (UCB) ranking
  • Thompson sampling

Monitoring

Key Alerts

MetricAlert Condition
CTRDrop > 5% from baseline
Log lossIncrease > 0.05
CalibrationError > 0.02
RevenueDrop > 3%
Latency p99> 10ms

A/B Testing

Every model change must be tested:

  1. Primary metric: Revenue (advertisers pay for clicks)
  2. Secondary: CTR, user experience metrics
  3. Guardrails: Latency, error rate

Run for statistical significance. Watch for long-term effects.

Reference

TopicDescription
Click fraudSeparate fraud detection model. Filter fraudulent clicks from training data.
Position biasDon't use position as a feature at training time. Or model position separately and remove its effect.
Fairness across advertisersCalibrate per advertiser segment. Monitor win rates by advertiser size.
ViewabilityPredict P(click
Accuracy vs latencyMore features help but slow inference. Trade-off based on ad value.
Freshness vs stabilityFrequent updates vs model stability. Incremental updates balance both.