Data Drift and Model Drift
Drift occurs when the statistical properties of data or the relationship between inputs and outputs change over time. Models trained on historical data may perform poorly on current data if drift is not detected and addressed.
Types of Drift
Data Drift (Covariate Shift)
Data drift occurs when the distribution of input features changes while the relationship between features and labels remains the same.
Example: A fraud detection model trained on US transactions is deployed to process European transactions. Transaction patterns differ (currencies, times, merchants), but the indicators of fraud remain the same.
| Feature | Training Distribution | Production Distribution |
|---|---|---|
| Transaction amount | $10-500 USD | €5-1000 EUR |
| Time of day | 9am-9pm EST | 24h global |
| Merchant categories | US retailers | EU and US retailers |
Concept Drift
Concept drift occurs when the relationship between inputs and outputs changes. The same input may have a different correct label over time.
Example: During a pandemic, bulk purchases of medical supplies shift from suspicious activity to normal consumer behavior. The definition of suspicious activity changes.
Model Drift (Prediction Drift)
Model drift occurs when the distribution of model outputs changes. This often results from data drift affecting predictions.
Example: A recommendation model begins suggesting only popular items because user engagement patterns changed, causing the model to favor safe recommendations.
Label Drift
Label drift occurs when the distribution of the target variable changes.
Example: An economic downturn increases loan default rates from 3% to 15%. A model calibrated for normal conditions will be poorly calibrated for the new distribution.
Causes of Drift
| Cause | Example | Detection Method |
|---|---|---|
| Seasonality | Holiday shopping patterns | Calendar-based monitoring |
| Trend changes | Platform algorithm updates | Engagement metrics |
| External events | Economic conditions, pandemics | External signal monitoring |
| Data pipeline issues | Schema changes, ETL failures | Data quality checks |
| Population changes | New user segments, market expansion | Demographic monitoring |
| Feedback loops | Model predictions influence future data | Causal analysis |
Detecting Drift
Statistical Tests for Numerical Features
The Kolmogorov-Smirnov (KS) test compares two sample distributions. The test computes a statistic measuring the maximum difference between the cumulative distribution functions of the baseline and current data. A low p-value (typically below 0.05) indicates statistically significant drift between the distributions.
Statistical Tests for Categorical Features
The Chi-squared test compares categorical distributions. Count the occurrences of each category in both the baseline and current data, then compute the chi-squared statistic measuring whether the observed frequencies differ significantly from expected frequencies. A low p-value indicates significant drift in the categorical distribution.
Population Stability Index (PSI)
PSI quantifies distribution shift and is commonly used in financial applications. The calculation involves binning both baseline and current data into the same bins, computing the percentage of observations in each bin, then summing the term (current_percent - baseline_percent) * ln(current_percent / baseline_percent) across all bins.
PSI Interpretation:
- PSI < 0.1: No significant change
- PSI 0.1-0.2: Moderate change, investigate
- PSI > 0.2: Significant change, action required
Monitoring Metrics
| Metric | Detection Target | Alert Threshold |
|---|---|---|
| Feature PSI | Input distribution shift | PSI > 0.2 |
| Prediction distribution | Output pattern changes | Mean shift > 10% |
| Null rate per feature | Data pipeline issues | Baseline + 5% |
| Cardinality changes | New categories | Baseline + 20% |
| Label distribution | Target variable shift | Domain-specific |
Responding to Drift
Decision Framework
-
Verify data pipeline integrity. Check for schema changes, ETL failures, or upstream data issues. If the issue is in the pipeline, fix the pipeline without retraining.
-
Assess performance impact. Drift does not always degrade performance. Monitor model metrics alongside drift metrics.
-
Evaluate drift duration. Temporary drift (one-time events, seasonality) may not require retraining. Persistent drift typically requires action.
-
Select response strategy. Based on drift characteristics, choose between rules-based mitigation, model retraining, or architecture changes.
Retraining Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Scheduled retraining | Retrain at fixed intervals | Predictable drift patterns |
| Triggered retraining | Retrain when drift exceeds threshold | Unpredictable drift patterns |
| Continuous learning | Incremental model updates | Fast-moving domains |
| Ensemble with recency weighting | Weight recent models higher | Gradual drift |
Automated Retraining Process:
- Calculate PSI for each feature by comparing baseline and current data distributions
- Track the maximum PSI across all features
- If any feature exceeds the threshold (typically 0.2), trigger a retraining job
- Log which features drifted and the severity of drift for investigation
Handling Concept Drift
Concept drift requires different strategies because the ground truth relationship has changed.
| Strategy | Description |
|---|---|
| Sliding window | Train only on recent data |
| Sample weighting | Weight recent examples higher |
| Regime detection | Identify change points, train separate models |
| Online learning | Update model incrementally with new labeled data |
Sliding Window: Filter training data to include only recent observations (e.g., last 90 days), discarding older data that may reflect outdated patterns.
Sample Weighting: Assign exponentially decaying weights based on sample age. For a half-life of 30 days, a sample from 30 days ago receives half the weight of today's sample, calculated as weight = e^(-age_in_days / half_life_days).
Monitoring Without Labels
When labels are not immediately available:
-
Monitor input drift. Detect feature distribution changes using PSI or statistical tests.
-
Monitor output drift. Track prediction distribution changes over time.
-
Use proxy metrics. Monitor downstream metrics (click-through rate, conversion) as indicators of model quality.
-
Flag for investigation. Alert when drift is detected, investigate potential concept drift when labels become available.
Industry Examples
| Company | Drift Challenge | Approach |
|---|---|---|
| Netflix | User preferences shift with new content | Continuous retraining, content-specific models |
| Uber | Demand patterns vary with events and weather | Real-time feature updates, event-specific models |
| Stripe | Fraud tactics evolve | Online learning, human-in-the-loop review |
| Professional trends change, new job titles emerge | Regular retraining, embedding updates |