Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Data Drift tab

As training and production data change over time, a deployed model loses predictive power. The data surrounding the model is said to be drifting. By leveraging the training data and prediction data (also known as inference data) that is added to your deployment, the Data Drift dashboard helps you analyze a model's performance after it has been deployed.

How does DataRobot track drift?

For data drift, DataRobot tracks:

  • Target drift: DataRobot stores statistics about predictions so that it can monitor how the distribution and values of the target change over time. Calculating target drift requires actuals data. As a baseline for comparing target distributions, DataRobot uses the distribution of predictions on the holdout.

  • Feature drift: DataRobot stores statistics about predictions so that it can monitor how distributions and values of features change over time. As a baseline for comparing distributions of features, DataRobot uses:

    • The distribution of a random sample of the training data for training datasets larger than 500 MB.
    • The distribution of 100% of the training data for training datasets smaller than 500 MB.

Target and feature tracking are enabled by default. You can control these drift tracking features by navigating to Deployment > Settings > Data.

Availability information

If feature drift tracking is turned off, a message displays on the Data Drift tab to remind you to enable feature drift tracking.

To receive email notifications on data drift status, configure notifications, schedule monitoring, and configure data drift monitoring settings.

The Data Drift dashboard provides three interactive and exportable visualizations that help identify the health of a deployed model over a specified time interval.

Note

The Export button allows you to download each chart on the Data Drift dashboard as a PNG, CSV, or ZIP file.

Chart Description
Feature Drift vs. Feature Importance Plots the importance of a feature in a model against how much the distribution of actual feature values has changed, or drifted, between one point in time and another.
Feature Details Plots percentage of records, i.e., the distribution, of the selected feature in the training data compared to the inference data.
Predictions Over Time Illustrates how the distribution of a model's predictions has changed over time (target drift). The display differs depending on whether the project is regression or binary classification.

You can customize how a deployment calculates data drift status by configuring drift and importance thresholds and additional definitions on the Settings > Monitoring page. You can also use the following controls to configure the Data Drift dashboard as needed:

Control Description
Model version selector Updates the dashboard displays to reflect the model you selected from the dropdown (only available for custom model deployments).
Date Slider Limits the range of data displayed on the dashboard (i.e., zooms in on a specific time period).
Range (UTC) Sets the date range displayed for the deployment date slider.
Resolution Sets the time granularity of the deployment date slider.
Selected Feature Sets the feature displayed on the feature details chart.
Refresh Initiates an on-demand update of the dashboard with new data. Otherwise, DataRobot refreshes the dashboard every 15 minutes.
Reset Reverts the dashboard controls to the default settings.

The Data Drift dashboard also supports segmented analysis, allowing you to view data drift while comparing a subset of training data to the predictions data for individual attributes and values using the Segment Attribute and Segment Value dropdowns.

Feature Drift vs Feature Importance chart

The Feature Drift vs. Feature Importance chart monitors the 25 most impactful numerical, categorical, and text-based features in your data.

Use the chart to see if data is different at one point in time compared to another. Differences may indicate problems with your model or in the data itself. For example, if users of an auto insurance product are getting younger over time, the data that built the original model may no longer result in accurate predictions for your newer data. Particularly, drift in features with high importance can be a warning flag about your model accuracy. Hover over a point in the chart to identify the feature name and report the precise values for drift (Y-axis) and importance (X-axis).

Feature Drift

The Y-axis reports the Drift value for a feature. This value is a calculation of the Population Stability Index (PSI), a measure of the difference in distribution over time.

Drift metric support

While the DataRobot UI only supports the Population Stability Index (PSI) metric, the API supports Kullback-Leibler Divergence, Hellinger Distance, Kolmogorov-Smirnov, Histogram Intersection, Wasserstein Distance, and Jensen–Shannon Divergence. In addition, using the Python API client, you can retrieve a list of supported metrics.

Feature Importance

The X-axis reports the Importance score for a feature, calculated when ingesting the learning (or training) data. DataRobot calculates feature importance differently depending on the model type. For DataRobot models and custom models, the Importance score is calculated using Permutation Importance. For external models, the importance score is an ACE Score. The dot resting at the Importance value of 1 is the target prediction . The most important feature in the model will also appear at 1 (as a solid green dot).

Interpret the quadrants

The quadrants represented in the chart help to visualize feature-by-feature data drift plotted against the feature's importance. Quadrants can be loosely interpreted as follows:

Quadrant Read as... Color indicator
High importance feature(s) are experiencing high drift. Investigate immediately. Red
Lower importance feature(s) are experiencing drift above the set threshold. Monitor closely. Yellow
Lower importance feature(s) are experiencing minimal drift. No action needed. Green
High importance feature(s) are experiencing minimal drift. No action needed, but monitor features that approach the threshold. Green

Note that points on the chart can also be gray or white. Gray circles represent features that have been excluded from drift status calculation, and white circles represent features set to high importance.

If you are the project owner, you can click the gear icon in the upper right chart corner to reset the quadrants. By default, the drift threshold defaults to .15. The Y-axis scales from 0 to the higher of 0.25 and the highest observed drift value. These quadrants can be customized by changing the drift and importance thresholds.

Feature Details chart

The Feature Details chart provides a histogram that compares the distribution of a selected feature in the training data to the distribution of that feature in the inference data.

Numeric features

For numeric data, DataRobot computes an efficient and precise approximation of the distribution of each feature. Based on this, drift tracking is conducted by comparing the normalized histogram for the training data to the scoring data using the selected drift metrics.

The chart displays 13 bins for numeric features:

  • 10 bins to capture the range of items observed in the training data.

  • Two bins to capture very high and very low values—extreme values in the scoring data that fall outside the range of the training data.

  • One bin for the Missing count, containing all records with missing feature values.

Categorical features

Unlike numeric data, where binning cutoffs for a histogram result from a data-dependent calculation, categorical data is inherently discrete in form (that is, not continuous), so binning is based on a defined category. Additionally, there could be missing or unseen category levels in the scoring data.

The process for drift tracking of categorical features is to calculate the fraction of rows for each categorical level ("bin") in the training data. This results in a vector of percentages for each level. The 25 most frequent levels are directly tracked—all other levels are aggregated to an Other bin. This process is repeated for the scoring data, and the two vectors are compared using the selected drift metric.

For categorical features, the chart includes two unique bins:

  • The Other bin contains all categorical features outside the 25 most frequent values. This aggregation is performed for drift tracking purposes; it doesn't represent the model's behavior.

  • The New level bin only displays after you make predictions with data that has a new value for a feature not in the training data. For example, consider a dataset about housing prices with the categorical feature City. If your inference data contains the value Boston and your training data did not, the Boston value (and other unseen cities) are represented in the New level bin.

To use the chart, select a feature from the dropdown. The list, which defaults to the target feature, includes any of the features tracked. Click a point in the Feature Drift vs. Feature Importance chart:

Text features

Text features are a high-cardinality problem, meaning the addition of new words does not have the impact of, for example, new levels found in categorical data. The method DataRobot uses to track drift of text features accounts for the fact that writing is subjective and cultural and may have spelling mistakes. In other words, to identify drift in text fields, it is more important to identify a shift in the whole language rather than in individual words.

Drift tracking for a text feature is conducted by:

  1. Detecting occurrences of the 1000 most frequent words from rows found in the training data.
  2. Calculating the fraction of rows that contain these terms for that feature in the training data and separately in the scoring data.
  3. Comparing the fraction in the scoring data to that in the training data.

The two vectors of occurrence fractions (one entry per word) are compared with the available drift metrics. Prior to applying this methodology, DataRobot performs basic tokenization by splitting the text feature into words (or characters in the case of Japanese or Chinese).

Predictions Over Time chart

The Predictions Over Time chart provides an at-a-glance determination of how the model's predictions have changed over time. For example:

Dave sees that his model is predicting 1 (readmitted) noticeably more frequently over the past month. Because he doesn't know of a corresponding change in the actual distribution of readmissions, he suspects that the model has become less accurate. With this information, he investigates further whether he should consider retraining.

Although the charts for binary classification and regression differ slightly, the take-away is the same—are the plot lines relatively stable across time? If not, is there a business reason for the anomaly (for example, a blizzard)? One way to check this is to look at the bar chart below the plot. If the point for a binned period is abnormally high or low, check the histogram below to make sure there are sufficient number of predictions for this to be a reliable data point.

Additionally, both charts have Training and Scoring labels across the X-axis. The Training label indicates the section of the chart that shows the distribution of predictions made on the holdout set of training data for the model. It will always have one point on the chart. The Scoring label indicates the section of the chart showing the distribution of predictions made on the deployed model. Scoring indicates that the model is in use to make predictions. It will have multiple points along the chart to indicate how prediction distributions change over time.

For regression projects

The Predictions Over Time chart for regression projects plots the average predicted value, as well as a visual indicator of the middle 80% range of predicted values for both training and prediction data. If training data is uploaded, the graph displays both the 10th-90th percentile and the mean value of the target ().

Hover over a point on the chart to view its details:

  • Date: The starting date of the bin data. Displayed values are based on counts from this date to the next point along the graph. For example, if the date on point A is 01-07 and point B is 01-14, then point A covers everything from 01-07 to 01-13 (inclusive).
  • Average Predicted Value: For all points included in the bin, this is the average of their values.
  • Predictions: The number of predictions included in the bin. Compare this value against other points if you suspect anomalous data.
  • 10th-90th Percentile: Percentile of predictions for that time period.

Note that you can also display this information for the mean value of the target by hovering on the point in the training data.

For binary classification projects

The Predictions Over Time chart for binary classification projects plots the class percentages, based on the labels you set when you added the deployment (in this example, 0 and 1). It also reports the threshold set for prediction output. The threshold is set when adding your deployment to the inventory and cannot be revised.

Hover over a point on the chart to view its details:

  • Date: The starting date of the bin data. Displayed values are based on counts from this date to the next point along the graph. For example, if the date on point A is 01-07 and point B is 01-14, then point A covers everything from 01-07 to 01-13 (inclusive).
  • <class-label>: For all points included in the bin, the percentage of those in the "positive" class (0 in this example).
  • <class-label>: For all points included in the bin, the percentage of those in the "negative" class (1 in this example).
  • Number of Predictions: The number of predictions included in the bin. Compare this value against other points if you suspect anomalous data.

Additionally, the chart displays the mean value of the target in the training data. As with all plotted points, you can hover over it to see the specific values.

The chart also includes a toggle in the upper-right corner that allows you to switch between continuous and binary modes (only for binary classification deployments):

Continuous mode shows the positive class predictions as probabilities between 0 and 1, without taking the prediction threshold into account:

Binary mode takes the prediction threshold into account and shows, of all predictions made, the percentage for each possible class:

Prediction warnings integration

If you have enabled prediction warnings for a deployment, any anomalous prediction values that trigger a warning are flagged in the Predictions Over Time bar chart.

Note

Prediction warnings are only available for regression model deployments.

The yellow section of the bar chart represents the anomalous predictions for a point in time.

To view the number of anomalous predictions for a specific time period, hover over the point on the plot corresponding to the flagged predictions in the bar chart.

Use the version selector

You can change the data drift display to analyze the current, or any previous, version of a model in the deployment. Initially, if there has been no model replacement, you only see the Current option. The models listed in the dropdown can also be found in the History section of the Overview tab. This functionality is only supported with deployments made with models or model images.

Use the time range and resolution dropdowns

The Range and Resolution dropdowns help diagnose deployment issues by allowing you to change the granularity of the three deployment monitoring tabs: Data Drift, Service Health, and Accuracy.

Expand the Range dropdown (1) to select the start and end dates for the time range you want to examine. You can specify the time of day for each date (to the nearest hour, rounded down) by editing the value after selecting a date. When you have determined the desired time range, click Update range (2). Select the Range reset icon () (3) to restore the time range to the previous setting.

Note

Note that the date picker only allows you to select dates and times between the start date of the deployment's current version of a model and the current date.

After setting the time range, use the Resolution dropdown to determine the granularity of the date slider. Select from hourly, daily, weekly, and monthly granularity based on the time range selected. If the time range is longer than 7 days, hourly granularity is not available.

When you choose a new value from the Resolution dropdown, the resolution of the date selection slider changes. Then, you can select start and end points on the slider to hone in on the time range of interest.

Note that the selected slider range also carries across the Service Health and Accuracy tabs (but not across deployments).

Use the date slider

The date slider limits the time range used for comparing prediction data to training data. The upper dates displayed in the slider, left and right edges, indicate the range currently used for comparison in the page's visualizations. The lower dates, left and right edges, indicate the full date range of prediction data available. The circles mark the "data buckets," which are determined by the time range.

To use the slider, click a point to move the line or drag the endpoint left or right.

The visualizations use predictions from the starting point of the updated time range as the baseline reference point, comparing them to predictions occurring up to the last date of the selected time range.

You can also move the slider to a different time interval while maintaining the periodicity. Click anywhere on the slider between the two endpoints to drag it (you will see a hand icon on your cursor).

In the example above, you see the slider spans a 3-month time interval. You can drag the slider and maintain the time interval of 3 months for different dates.

By default, the slider is set to display the same date range that is used to calculate and display drift status. For example, if drift status captures the last week, then the default slider range will span from the last week to the current date.

You can move the slider to any date range without affecting the data drift status display on the health dashboard. If you do so, a Reset button appears above the slider. Clicking it will revert the slider to the default date range that matches the range of the drift status.

Class selector

Multiclass deployments offer class-based configuration to modify the data displayed on the Data Drift graphs.

Predictions over Time multiclass graph:

Feature Details multiclass graph:


Updated July 7, 2022
Back to top