Model Drift: Why “Set-It and Forget-It” Doesn’t Work for Machine Learning Models — Analyzr
Machine learning is undeniably useful. ML simplifies the computation of complex data outputs and extracts value from data that humans simply can’t achieve. However, just because a machine is executing the heavy lifting, doesn’t mean ML models are a “set-it and forget-it” type of activity. Like most things in life, data changes over time, leading to model drift.
Relationships between variables in the data pipeline can also change. These changes prompt what is called Model Drift, sometimes also referred to as Model Decay.
Models don’t like change. They are trained to assume future data ingested will look like the data used to build the model, so Model Drift is a thing we want to monitor for and prevent. In the subsequent post, we’ll describe two types of Model Drift and outline a few ways you can identify and take action to prevent model degradation.
There are two types of Model Drift:
Let’s dive into Data drift first. Data drift represents a shift in the distribution of the input data relative to the data used to train the model. This occurs naturally as new information is added, or existing information is edited in the systems that feed the model data pipeline.
As with any analytics, you want to make sure new data is pulled through into the model and the most current quantities and rates are reflected on a regular basis.
Data drift doesn’t always indicate degradation in model performance. If the distribution shift doesn’t reflect values outside of the trained decision boundary, then the model still works. But if the opposite occurs, and the distribution shift causes values to appear outside of the trained decision boundary, then the model performance has deteriorated.
Concept drift represents a shift in the relationship between the underlying variables and the output of the model. For example: Company X introduces a new product category.
The ML model used to target accounts with high propensity to buy was trained on data that doesn’t include this new category. Some accounts which were previously low propensity, might now be high propensity for this new product, but until the model is retrained, these accounts aren’t classified correctly.
This type of data change is almost always detrimental to model performance. The decision boundary has been fundamentally altered, but the model hasn’t been alerted to this change and may produce erroneous outputs based on obsolete training boundaries.
Note that concept drift factors can be challenging to identify, especially when they are considered “hidden context” factors such as strength of economic growth or consumer buying power, which are difficult to reflect in model data.
Identifying And Preventing Model Drift
There are some simple steps that any business user can take to monitor for and fix Model Drift.
1. Check model statistics and assess performance on a regular basis.
- When dealing with propensity models one of the most common stats to watch for is the F1 score, an accuracy metric. F1 measures two key attributes of any model: precision and recall. Precision and recall are best explained with a visual representation, which this piece by Datatron illustrates very well.
- For cluster-based models, often used for segmentation, the Silhouette score allows the user to measure how clearly separated and cohesive the resulting clusters are from each other. A score of -1 means clusters are poorly assigned, while a score of 1 indicates well distributed and differentiated clusters.
2. Refresh your ML models on a regular basis, say quarterly.
This may mean simply retraining the model using the most recent data and ensuring model performance holds, or it may mean refitting the model entirely, including adding or removing variables to ensure optimal performance.
For models in production, the model refresh process is a great way to monitor outcomes and look for any edge cases that may pop up.
3. Finally, and most importantly, do a commonsense check.
As a rational human user of the data, check results from model outcomes and ensure that these outcomes match expected results.
For example, if the top graded accounts from your propensity model aren’t performing better than lower rated accounts, then you have a problem.
ML models can be significantly better than humans at analyzing complex data sets, but this doesn’t mean they don’t need constant care and human guidance. The adage “Garbage In, Garbage Out” is especially true for machine learning, as ML models are only as good as the data pipeline that trained them. Monitoring for and preventing Model Drift ensures outputs match expectations and are optimized for business use.