Skip to main content

Data Drift and Model Drift

Drift occurs when the statistical properties of data or the relationship between inputs and outputs change over time. Models trained on historical data may perform poorly on current data if drift is not detected and addressed.

Types of Drift

Data Drift (Covariate Shift)

Data drift occurs when the distribution of input features changes while the relationship between features and labels remains the same.

Example: A fraud detection model trained on US transactions is deployed to process European transactions. Transaction patterns differ (currencies, times, merchants), but the indicators of fraud remain the same.

FeatureTraining DistributionProduction Distribution
Transaction amount$10-500 USD€5-1000 EUR
Time of day9am-9pm EST24h global
Merchant categoriesUS retailersEU and US retailers

Concept Drift

Concept drift occurs when the relationship between inputs and outputs changes. The same input may have a different correct label over time.

Example: During a pandemic, bulk purchases of medical supplies shift from suspicious activity to normal consumer behavior. The definition of suspicious activity changes.

Model Drift (Prediction Drift)

Model drift occurs when the distribution of model outputs changes. This often results from data drift affecting predictions.

Example: A recommendation model begins suggesting only popular items because user engagement patterns changed, causing the model to favor safe recommendations.

Label Drift

Label drift occurs when the distribution of the target variable changes.

Example: An economic downturn increases loan default rates from 3% to 15%. A model calibrated for normal conditions will be poorly calibrated for the new distribution.

Causes of Drift

CauseExampleDetection Method
SeasonalityHoliday shopping patternsCalendar-based monitoring
Trend changesPlatform algorithm updatesEngagement metrics
External eventsEconomic conditions, pandemicsExternal signal monitoring
Data pipeline issuesSchema changes, ETL failuresData quality checks
Population changesNew user segments, market expansionDemographic monitoring
Feedback loopsModel predictions influence future dataCausal analysis

Detecting Drift

Statistical Tests for Numerical Features

The Kolmogorov-Smirnov (KS) test compares two sample distributions. The test computes a statistic measuring the maximum difference between the cumulative distribution functions of the baseline and current data. A low p-value (typically below 0.05) indicates statistically significant drift between the distributions.

Statistical Tests for Categorical Features

The Chi-squared test compares categorical distributions. Count the occurrences of each category in both the baseline and current data, then compute the chi-squared statistic measuring whether the observed frequencies differ significantly from expected frequencies. A low p-value indicates significant drift in the categorical distribution.

Population Stability Index (PSI)

PSI quantifies distribution shift and is commonly used in financial applications. The calculation involves binning both baseline and current data into the same bins, computing the percentage of observations in each bin, then summing the term (current_percent - baseline_percent) * ln(current_percent / baseline_percent) across all bins.

PSI Interpretation:

  • PSI < 0.1: No significant change
  • PSI 0.1-0.2: Moderate change, investigate
  • PSI > 0.2: Significant change, action required

Monitoring Metrics

MetricDetection TargetAlert Threshold
Feature PSIInput distribution shiftPSI > 0.2
Prediction distributionOutput pattern changesMean shift > 10%
Null rate per featureData pipeline issuesBaseline + 5%
Cardinality changesNew categoriesBaseline + 20%
Label distributionTarget variable shiftDomain-specific

Responding to Drift

Decision Framework

  1. Verify data pipeline integrity. Check for schema changes, ETL failures, or upstream data issues. If the issue is in the pipeline, fix the pipeline without retraining.

  2. Assess performance impact. Drift does not always degrade performance. Monitor model metrics alongside drift metrics.

  3. Evaluate drift duration. Temporary drift (one-time events, seasonality) may not require retraining. Persistent drift typically requires action.

  4. Select response strategy. Based on drift characteristics, choose between rules-based mitigation, model retraining, or architecture changes.

Retraining Strategies

StrategyDescriptionUse Case
Scheduled retrainingRetrain at fixed intervalsPredictable drift patterns
Triggered retrainingRetrain when drift exceeds thresholdUnpredictable drift patterns
Continuous learningIncremental model updatesFast-moving domains
Ensemble with recency weightingWeight recent models higherGradual drift

Automated Retraining Process:

  1. Calculate PSI for each feature by comparing baseline and current data distributions
  2. Track the maximum PSI across all features
  3. If any feature exceeds the threshold (typically 0.2), trigger a retraining job
  4. Log which features drifted and the severity of drift for investigation

Handling Concept Drift

Concept drift requires different strategies because the ground truth relationship has changed.

StrategyDescription
Sliding windowTrain only on recent data
Sample weightingWeight recent examples higher
Regime detectionIdentify change points, train separate models
Online learningUpdate model incrementally with new labeled data

Sliding Window: Filter training data to include only recent observations (e.g., last 90 days), discarding older data that may reflect outdated patterns.

Sample Weighting: Assign exponentially decaying weights based on sample age. For a half-life of 30 days, a sample from 30 days ago receives half the weight of today's sample, calculated as weight = e^(-age_in_days / half_life_days).

Monitoring Without Labels

When labels are not immediately available:

  1. Monitor input drift. Detect feature distribution changes using PSI or statistical tests.

  2. Monitor output drift. Track prediction distribution changes over time.

  3. Use proxy metrics. Monitor downstream metrics (click-through rate, conversion) as indicators of model quality.

  4. Flag for investigation. Alert when drift is detected, investigate potential concept drift when labels become available.

Industry Examples

CompanyDrift ChallengeApproach
NetflixUser preferences shift with new contentContinuous retraining, content-specific models
UberDemand patterns vary with events and weatherReal-time feature updates, event-specific models
StripeFraud tactics evolveOnline learning, human-in-the-loop review
LinkedInProfessional trends change, new job titles emergeRegular retraining, embedding updates