The Accuracy tab allows you to analyze the performance of model deployments over time, using standard statistical measures and exportable visualizations. Use this tool to determine whether a model's quality is decaying and if you should consider replacing it. The Accuracy tab renders insights based on the problem type and its associated optimization metrics—metrics that vary depending on regression or binary classification projects.
Note that the accuracy scores displayed on this tab are estimates, and may differ from accuracy scores computed using every prediction row in the raw data. This is due to hourly data processing limits. Within the hourly limit, DataRobot cannot compute accuracy scores using more than 100,000 rows, and instead provides scores based on the rows it was able to compute for accuracy. To achieve a more precise accuracy score, span prediction requests across multiple hours to avoid reaching the hourly computation limit.
Enable the Accuracy tab¶
The Accuracy tab is not enabled for deployments by default. To enable it, first upload the data that contains predicted and actual values for the deployment collected outside of DataRobot. Reference the overview of setting up accuracy for deployments by adding actuals for more information.
There are three errors that can prevent accuracy analysis:
No data is provided for the selected time range. Predicted and actual values must match the time range selected.
No actuals have been added to the deployment. Add actuals from the Settings > Data tab.
There is an insufficient number of predictions to enable accuracy analysis. A minimum of 100 rows of predictions with corresponding actual values are required to enable the Accuracy tab. Add more actuals from the Settings > Data tab.
Time range and resolution dropdowns¶
The controls—model version and data time range selectors—work the same as those available on the Data Drift tab. The Accuracy tab also supports segmented analysis, allowing you to view accuracy for individual segment attributes and values.
Configure accuracy metrics¶
Deployment owners can configure multiple accuracy metrics for each deployment. The accuracy metrics a deployment uses display as individual tiles above the accuracy graphs. Select Customize Tiles to edit the metrics used.
The dialog box lists all of the metrics currently enabled for the deployment. They are listed from top to bottom in order of their appearance as tiles, from left to right.
To change the positioning of a tile, select the up arrow to move it to the left and the down arrow to move it to the right.
To add a new metric tile, click Add another metric. Each deployment can display up to 10 accuracy tiles.
To change a tile's accuracy metric, click the dropdown for the metric you wish to change and choose the metric to replace it.
When you have made all of your changes, click OK. The Accuracy tab updates to reflect the changes made to the displayed metrics.
Available accuracy metrics¶
The metrics available depend on the type of modeling project used for the deployment: regression or binary classification.
|Modeling type||Available metrics|
|Regression||RMSE, MAE, Gamma Deviance, Tweedie Deviance, R Squared, FVE Gamma, FVE Poisson, FVE Tweedie, Poisson Deviance, MAD, MAPE, RMSLE|
|Binary classification||LogLoss, AUC, Kolmogorov-Smirnov, Gini-Norm, Rate@Top10%, Rate@Top5%, TNR, TPR, FPR, PPV, NVP, F1, MCC, Accuracy, Balanced Accuracy, FVE Binomial|
The Accuracy tab displays slightly different results based on whether the deployment is a regression or binary classification project.
Accuracy over Time graph¶
The graph above shows the change in a selected accuracy metric value (LogLoss in this example) over time. These metrics are identical to those used for the evaluation of the model before deployment. The Start score for each metric represents the accuracy score for the model calculated using the trained model’s predictions on the holdout partition. Click on any metric tile above the graph to change the display:
Hover over a point on the graph to see specific details:
|Timestamp (1)||The period of time that the point captures|
|Metric (2)||The selected optimization metric value for the point’s time period. It reflects the score of the corresponding metric tile above the graph, adjusted for the displayed time period.|
|Predicted (3)||The average predicted value (derived from the prediction data) for the point's time period. Values are reflected by the blue points along the Predicted & Actual graph.||The frequency, as a percentage, of how often the prediction data predicted the value label (true or false) for the point’s time period. Values are represented by the blue points along the Predicted & Actual graph. See the image below for information on setting the label.|
|Actual (4)||The average actual value (derived from the actuals data) for the point's time period. Values are reflected by the orange points along the Predicted & Actual graph.||The frequency, as a percentage, that the actual data is the value 1 (true) for the point's time period. These values are represented by the orange points along the Predicted & Actual graph. See the image below for information on setting the label.|
|Row count (5)||The number of rows represented by this point on the chart.|
|Missing Actuals (6)||The number of prediction rows that do not have corresponding actual values recorded. This value is not specific to the point selected.|
You can select which classification value to show (0 or 1 in this example) from the dropdown menu at the top of the Predicted & Actual graph:
Predicted & Actual graph¶
The graph above shows the predicted and actual values along a timeline of a binary classification dataset. Hovering over a point in either plot shows the same values as those on the Data Drift tab (assuming the time sliders are set to the same time range).
For a binary classification project, the timeline and bucketing work the same as for regression projects, but with this project type you can select the class to display results for (as described in the Accuracy over Time graph above).
The volume chart below the graph displays the number of actual values that correspond to the predictions made at each point. The shaded area represents the number of uploaded actuals, and the striped area represents the number of predictions missing corresponding actuals.
To identify predictions that are missing actuals, click the Download IDs of missing actuals link. This prompts the download of a CSV file (
missing_actuals.csv) that lists the predictions made that are missing actuals, along with the association ID of each prediction. Use the association IDs to upload the actuals with matching IDs.
Multiclass deployments offer class-based configuration to modify the data displayed on the Accuracy graphs. By default, the graphs display the five most common classes in the training data. All other classes are represented by a single line. Above the date slider, there is a Target Class dropdown. This indicates which classes are selected to display on the selected tab.
Click the dropdown to select the classes you want to display. Choose Use all classes or Select specific classes.
If you want to display all classes, select the first option and then click Apply.
To display a specific class, select the second option. Type the class names in the subsequent field to indiciate those that you want to display (up to five classes can display at once). DataRobot provides quick select shortcuts for classes: the five class most common in the training data, the five with the lowest accuracy score, and the five with the greatest amount of data drift. Once you have specified the five classes to display, click Apply.
Once specified, the charts on the tab (deploy-accuracy or Data Drift) update to display the selected classes.
Accuracy multiclass graphs¶
Accuracy over Time:
Predicted vs Actual:
DataRobot uses the optimization metric tile selected for a deployment as the accuracy score to create an alert status. Interpret the alert statuses as follows:
Green: Accuracy is similar to when the model was deployed.
Yellow: Accuracy has declined since the model was deployed.
Red: Accuracy has severely declined since the model was deployed.
Gray: No accuracy data is available.
You have training data from the XYZhistorical database table, which includes the target "is activity fraudulent?" After building your model, you score it against the XYZDaily table (which does not have the target) and write out the predictions to the XYZscored database table. Downstream applications use XYZscored; instances written to at prediction time are later independently added to XYZhistorical.
To determine whether your model is making accurate predictions, every month you join together XYZhistorical and XYZscored. This provides you with the predicted fraudulent value and the actually fraudulent value in a single table.
Finally, you add this prediction dataset to your DataRobot deployment, setting the actual and predicted columns. DataRobot then analyzes the results and provides metrics to help identify any model deterioration and need for replacement.