Operations & Monitoring

Data Drift, Model Decay, and Dataset Versioning

● Intermediate ⏱ 45 min read Operations

Your model was perfect at launch. Six months later it's noticeably worse. No code changed. No outages occurred. The model just... drifted. This is the fundamental challenge of operating ML systems in production: the world changes, but your model doesn't, until you retrain it.

Types of Drift

Understanding what's drifting is essential for knowing what to do about it.

Feature Drift (Data Drift / Covariate Shift)

The input data distribution has changed. Your model was trained when users skewed 25-35 years old. Now the app went viral with teenagers. The feature age now has a very different distribution than during training. The model still works for inputs it's seen before, but it's now being asked to extrapolate to new territory.

Label Drift (Concept Drift)

The relationship between inputs and outputs has changed. You trained a churn model during economic growth. During a recession, high-paying customers who previously stayed start churning. The same features now predict a different label. This is the hardest type of drift to detect early because you need ground truth labels, which often come weeks after predictions.

Prediction Drift

The distribution of your model's outputs has changed, even if inputs look similar. More high-confidence predictions, or a shift in the average predicted score. This is often a leading indicator of the above drift types and is easy to monitor continuously.

Drift Type	What Changes	Detectable Without Labels?	Response
Feature Drift	Input distributions P(X)	Yes	Retrain with new data
Label Drift	P(Y\|X) relationship	No (need labels)	Retrain with fresh labels
Prediction Drift	Output distributions P(Ŷ)	Yes	Investigate; likely retrain
Upstream Data	Schema/format of raw data	Yes (validation)	Fix pipeline

Statistical Drift Detection

Drift detection is fundamentally a statistical question: are two distributions the same? Several tests are commonly used:

Kolmogorov-Smirnov (KS) Test

The KS test measures the maximum distance between two empirical cumulative distribution functions. It works well for continuous features and doesn't assume a particular distribution shape.

python

Evidently AI: Automated Drift Reports

Evidently AI is an open-source library for generating comprehensive ML monitoring reports. It handles the statistical tests, visualizations, and report generation automatically.

python

⚠️

Reference Window Matters Your reference dataset should represent the period the model was trained on. Using stale reference data (e.g., always comparing to last week) will accumulate drift over time without triggering alerts. Always compare to the original training window.

Dataset Versioning for Retraining

When drift is detected and you decide to retrain, you need to know exactly which dataset version to use. DVC provides this capability — every dataset version is linked to a git commit hash.

bash

Automated Retraining Pipelines

The final piece of the MLOps loop is automating retraining when drift exceeds your threshold. Here's a complete automated retraining trigger using an Airflow sensor:

python

✅

Retraining Cadence Strategy Don't just retrain on drift — also retrain on a schedule (weekly or monthly) to stay fresh. Combine scheduled retraining with drift-triggered retraining for a robust strategy. Schedule-based retraining catches slow gradual drift that might not cross your statistical threshold.

Congratulations — you've completed the full MLOps learning path. You now understand the complete lifecycle from data ingestion to production monitoring and automated retraining.