Data Drift and Model Drift

Drift occurs when the statistical properties of data or the relationship between inputs and outputs change over time. Models trained on historical data may perform poorly on current data if drift is not detected and addressed.

Types of Drift

Data Drift (Covariate Shift)

Data drift occurs when the distribution of input features changes while the relationship between features and labels remains the same.

Example: A fraud detection model trained on US transactions is deployed to process European transactions. Transaction patterns differ (currencies, times, merchants), but the indicators of fraud remain the same.

Feature	Training Distribution	Production Distribution
Transaction amount	$10-500 USD	€5-1000 EUR
Time of day	9am-9pm EST	24h global
Merchant categories	US retailers	EU and US retailers

Concept Drift

Concept drift occurs when the relationship between inputs and outputs changes. The same input may have a different correct label over time.

Example: During a pandemic, bulk purchases of medical supplies shift from suspicious activity to normal consumer behavior. The definition of suspicious activity changes.

Model Drift (Prediction Drift)

Model drift occurs when the distribution of model outputs changes. This often results from data drift affecting predictions.

Example: A recommendation model begins suggesting only popular items because user engagement patterns changed, causing the model to favor safe recommendations.

Label Drift

Label drift occurs when the distribution of the target variable changes.

Example: An economic downturn increases loan default rates from 3% to 15%. A model calibrated for normal conditions will be poorly calibrated for the new distribution.

Causes of Drift

Cause	Example	Detection Method
Seasonality	Holiday shopping patterns	Calendar-based monitoring
Trend changes	Platform algorithm updates	Engagement metrics
External events	Economic conditions, pandemics	External signal monitoring
Data pipeline issues	Schema changes, ETL failures	Data quality checks
Population changes	New user segments, market expansion	Demographic monitoring
Feedback loops	Model predictions influence future data	Causal analysis

Detecting Drift

Statistical Tests for Numerical Features

The Kolmogorov-Smirnov (KS) test compares two sample distributions. The test computes a statistic measuring the maximum difference between the cumulative distribution functions of the baseline and current data. A low p-value (typically below 0.05) indicates statistically significant drift between the distributions.

Statistical Tests for Categorical Features

The Chi-squared test compares categorical distributions. Count the occurrences of each category in both the baseline and current data, then compute the chi-squared statistic measuring whether the observed frequencies differ significantly from expected frequencies. A low p-value indicates significant drift in the categorical distribution.

Population Stability Index (PSI)

PSI quantifies distribution shift and is commonly used in financial applications. The calculation involves binning both baseline and current data into the same bins, computing the percentage of observations in each bin, then summing the term (current_percent - baseline_percent) * ln(current_percent / baseline_percent) across all bins.

PSI Interpretation:

PSI < 0.1: No significant change
PSI 0.1-0.2: Moderate change, investigate
PSI > 0.2: Significant change, action required

Monitoring Metrics

Metric	Detection Target	Alert Threshold
Feature PSI	Input distribution shift	PSI > 0.2
Prediction distribution	Output pattern changes	Mean shift > 10%
Null rate per feature	Data pipeline issues	Baseline + 5%
Cardinality changes	New categories	Baseline + 20%
Label distribution	Target variable shift	Domain-specific

Responding to Drift

Decision Framework

Verify data pipeline integrity. Check for schema changes, ETL failures, or upstream data issues. If the issue is in the pipeline, fix the pipeline without retraining.
Assess performance impact. Drift does not always degrade performance. Monitor model metrics alongside drift metrics.
Evaluate drift duration. Temporary drift (one-time events, seasonality) may not require retraining. Persistent drift typically requires action.
Select response strategy. Based on drift characteristics, choose between rules-based mitigation, model retraining, or architecture changes.

Retraining Strategies

Strategy	Description	Use Case
Scheduled retraining	Retrain at fixed intervals	Predictable drift patterns
Triggered retraining	Retrain when drift exceeds threshold	Unpredictable drift patterns
Continuous learning	Incremental model updates	Fast-moving domains
Ensemble with recency weighting	Weight recent models higher	Gradual drift

Automated Retraining Process:

Calculate PSI for each feature by comparing baseline and current data distributions
Track the maximum PSI across all features
If any feature exceeds the threshold (typically 0.2), trigger a retraining job
Log which features drifted and the severity of drift for investigation

Handling Concept Drift

Concept drift requires different strategies because the ground truth relationship has changed.

Strategy	Description
Sliding window	Train only on recent data
Sample weighting	Weight recent examples higher
Regime detection	Identify change points, train separate models
Online learning	Update model incrementally with new labeled data

Sliding Window: Filter training data to include only recent observations (e.g., last 90 days), discarding older data that may reflect outdated patterns.

Sample Weighting: Assign exponentially decaying weights based on sample age. For a half-life of 30 days, a sample from 30 days ago receives half the weight of today's sample, calculated as weight = e^(-age_in_days / half_life_days).

Monitoring Without Labels

When labels are not immediately available:

Monitor input drift. Detect feature distribution changes using PSI or statistical tests.
Monitor output drift. Track prediction distribution changes over time.
Use proxy metrics. Monitor downstream metrics (click-through rate, conversion) as indicators of model quality.
Flag for investigation. Alert when drift is detected, investigate potential concept drift when labels become available.

Industry Examples

Company	Drift Challenge	Approach
Netflix	User preferences shift with new content	Continuous retraining, content-specific models
Uber	Demand patterns vary with events and weather	Real-time feature updates, event-specific models
Stripe	Fraud tactics evolve	Online learning, human-in-the-loop review
LinkedIn	Professional trends change, new job titles emerge	Regular retraining, embedding updates

Types of Drift​

Data Drift (Covariate Shift)​

Concept Drift​

Model Drift (Prediction Drift)​

Label Drift​

Causes of Drift​

Detecting Drift​

Statistical Tests for Numerical Features​

Statistical Tests for Categorical Features​

Population Stability Index (PSI)​

Monitoring Metrics​

Responding to Drift​

Decision Framework​

Retraining Strategies​

Handling Concept Drift​

Monitoring Without Labels​

Industry Examples​

Table of Contents