Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Accuracy tab

The Accuracy tab allows you to analyze the performance of model deployments over time using standard statistical measures and exportable visualizations.

Use this tool to determine whether a model's quality is decaying and if you should consider replacing it. The Accuracy tab renders insights based on the problem type and its associated optimization metrics—metrics that vary depending on regression or binary classification projects.

Note

The accuracy scores displayed on this tab are estimates and may differ from accuracy scores computed using every prediction row in the raw data. This is due to hourly data processing limits. Within the hourly limit, DataRobot cannot compute accuracy scores using more than 100,000 rows and instead provides scores based on the rows it was able to compute for accuracy. To achieve a more precise accuracy score, span prediction requests across multiple hours to avoid reaching the hourly computation limit.

Note

The accuracy scores displayed on this tab are estimates and may differ from accuracy scores computed using every prediction row in the raw data. This is due to data processing limits (hourly or daily, depending on the configuration). DataRobot cannot compute accuracy scores using every row of larger prediction requests and instead provides scores based on the rows it was able to compute for accuracy. To achieve a more precise accuracy score, span prediction requests over multiple hours or days to avoid reaching the computation limit.

Enable the Accuracy tab

The Accuracy tab is not enabled for deployments by default. To enable it, enable target monitoring, set an association ID, and upload the data that contains predicted and actual values for the deployment collected outside of DataRobot. Reference the overview of setting up accuracy for deployments by adding actuals for more information.

The following errors can prevent accuracy analysis:

Problem Resolution
Disabled target monitoring setting Enable target monitoring on the Data Drift > Settings tab. A message appears on the Accuracy tab to remind you to enable target monitoring.
Missing Association ID at prediction time Set an association ID before making predictions to include those predictions in accuracy tracking.
Missing actuals Add actuals on the Accuracy > Settings tab.
Insufficient predictions to enable accuracy analysis Add more actuals on the Accuracy > Settings tab. A minimum of 100 rows of predictions with corresponding actual values are required to enable the Accuracy tab.
Missing data for the selected time range Ensure predicted and actual values match the selected time range to view accuracy metrics for that range.

Time range and resolution dropdowns

The controls—model version and data time range selectors—work the same as those available on the Data Drift tab. The Accuracy tab also supports segmented analysis, allowing you to view accuracy for individual segment attributes and values.

Note

To receive email notifications on accuracy status, configure notifications, schedule monitoring, and configure accuracy monitoring settings.

Configure accuracy metrics

Deployment owners can configure multiple accuracy metrics for each deployment. The accuracy metrics a deployment uses appear as individual tiles above the accuracy graphs. Select Customize Tiles to edit the metrics used.

The dialog box lists all of the metrics currently enabled for the deployment. They are listed from top to bottom in order of their appearance as tiles, from left to right.

To change the positioning of a tile, select the up arrow to move it to the left and the down arrow to move it to the right.

To add a new metric tile, click Add another metric. Each deployment can display up to 10 accuracy tiles.

To change a tile's accuracy metric, click the dropdown for the metric you wish to change and choose the metric to replace it.

When you have made all of your changes, click OK. The Accuracy tab updates to reflect the changes made to the displayed metrics.

Available accuracy metrics

The metrics available depend on the type of modeling project used for the deployment: regression, binary classification, or multiclass.

Modeling type Available metrics
Regression RMSE, MAE, Gamma Deviance, Tweedie Deviance, R Squared, FVE Gamma, FVE Poisson, FVE Tweedie, Poisson Deviance, MAD, MAPE, RMSLE
Binary classification LogLoss, AUC, Kolmogorov-Smirnov, Gini-Norm, Rate@Top10%, Rate@Top5%, TNR, TPR, FPR, PPV, NPV, F1, MCC, Accuracy, Balanced Accuracy, FVE Binomial
Multiclass LogLoss, FVE Multinomial

For more information on these metrics, see the Optimization metrics documentation.

Interpret results

The Accuracy tab displays slightly different results based on whether the deployment is a regression or binary classification project.

Time of Prediction

The Time of Prediction value differs between the Data drift and Accuracy tabs and the Service health tab:

  • On the Service health tab, the "time of prediction request" is always the time the prediction server received the prediction request. This method of prediction request tracking accurately represents the prediction service's health for diagnostic purposes.

  • On the Data drift and Accuracy tabs, the "time of prediction request" is, by default, the time you submitted the prediction request, which you can override with the prediction timestamp in the Prediction History and Service Health settings.

Accuracy over Time graph

The Accuracy over Time graph displays the change over time for a selected accuracy metric value (LogLoss in this example):

The Start value (the baseline accuracy score) and the plotted accuracy baseline represent the accuracy score for the model, which is calculated using the trained model’s predictions on the holdout partition:

Holdout partition for custom models

Click on any metric tile above the graph to change the display:

Hover over a point on the graph to see specific details:

Field Regression Classification
Timestamp (1) The period of time that the point captures.
Metric (2) The selected optimization metric value for the point’s time period. It reflects the score of the corresponding metric tile above the graph, adjusted for the displayed time period.
Predicted (3) The average predicted value (derived from the prediction data) for the point's time period. Values are reflected by the blue points along the Predicted & Actual graph. The frequency, as a percentage, of how often the prediction data predicted the value label (true or false) for the point’s time period. Values are represented by the blue points along the Predicted & Actual graph. See the image below for information on setting the label.
Actual (4) The average actual value (derived from the actuals data) for the point's time period. Values are reflected by the orange points along the Predicted & Actual graph. The frequency, as a percentage, that the actual data is the value 1 (true) for the point's time period. These values are represented by the orange points along the Predicted & Actual graph. See the image below for information on setting the label.
Row count (5) The number of rows represented by this point on the chart.
Missing Actuals (6) The number of prediction rows that do not have corresponding actual values recorded. This value is not specific to the point selected.

Predicted & Actual graph

The graph above shows the predicted and actual values along a timeline of a binary classification dataset. Hovering over a point in either plot shows the same values as those on the Data Drift tab (assuming the time sliders are set to the same time range).

You can select which classification value to show (0 or 1 in this example) from the dropdown menu at the top of the Predicted & Actual graph:

For a binary classification project, the timeline and bucketing work the same as for regression projects, but with this project type, you can select the class to display results for (as described in the Accuracy over Time graph above).

The volume chart below the graph displays the number of actual values that correspond to the predictions made at each point. The shaded area represents the number of uploaded actuals, and the striped area represents the number of predictions missing corresponding actuals.

To identify predictions that are missing actuals, click the Download IDs of missing actuals link. This prompts the download of a CSV file (missing_actuals.csv) that lists the predictions made that are missing actuals, along with the association ID of each prediction. Use the association IDs to upload the actuals with matching IDs.

Class selector

Multiclass deployments offer class-based configuration to modify the data displayed on the Accuracy graphs. By default, the graphs display the five most common classes in the training data. All other classes are represented by a single line. Above the date slider, there is a Target Class dropdown. This indicates which classes are selected to display on the selected tab.

Click the dropdown to select the classes you want to display. Choose Use all classes or Select specific classes.

If you want to display all classes, select the first option and then click Apply.

To display a specific class, select the second option. Type the class names in the subsequent field to indicate those that you want to display (up to five classes can display at once). DataRobot provides quick select shortcuts for classes: the five most common classes in the training data, the five with the lowest accuracy score, and the five with the greatest amount of data drift. Once you have specified the five classes to display, click Apply.

Once specified, the charts on the tab (deploy-accuracy or Data Drift) update to display the selected classes.

Accuracy multiclass graphs

Accuracy over Time:

Predicted vs. Actual:

Interpret alerts

DataRobot uses the optimization metric tile selected for a deployment as the accuracy score to create an alert status. Interpret the alert statuses as follows:

Color Accuracy Action
Green / Passing Accuracy is similar to when the model was deployed. No action needed.
Yellow / At risk Accuracy has declined since the model was deployed. Concerns found but no immediate action needed; monitor.
Red / Failing Accuracy has severely declined since the model was deployed. Immediate action needed.
Gray / Unknown No accuracy data is available. Insufficient predictions made (min. 100 required) Make predictions.

For example...

You have training data from the XYZhistorical database table, which includes the target "is activity fraudulent?" After building your model, you score it against the XYZDaily table (which does not have the target) and write out the predictions to the XYZscored database table. Downstream applications use XYZscored; instances written to at prediction time are later independently added to XYZhistorical.

To determine whether your model is making accurate predictions, every month, you join together XYZhistorical and XYZscored. This provides you with the predicted fraudulent value and the actually fraudulent value in a single table.

Finally, you add this prediction dataset to your DataRobot deployment, setting the actual and predicted columns. DataRobot then analyzes the results and provides metrics to help identify any model deterioration and need for replacement.


Updated February 21, 2024