Prediction Distribution graph¶
The Prediction Distribution graph (on the ROC Curve tab) illustrates the distribution of actual values in relation to the display threshold (a dividing line for interpreting results). Every prediction to the left of the dividing line is classified as a "false" and every prediction to the right of the dividing line is classified as a "true".
The Prediction Distribution graph visually expresses model performance for the selected data source. Based on Classification use case 2, this Prediction Distribution graph shows the predicted probabilities for the two groups of patients (readmitted and not readmitted), illustrating how well your model discriminates between them. The colors correspond to the rows of the confusion matrix—red represents patients not readmitted, blue represents readmitted patients. You can see that both red and blue fall on either side of the display threshold:
You can interpret the graph using this table:
|Color on graph||Location||State|
|red||left of the threshold||true negative (TN)|
|blue||left of the threshold||false negative (FN)|
|red||right of the threshold||false positive (FP)|
|blue||right of the threshold||true positive (TP)|
Note that the grey represents the overlap of red and blue.
With a classification problem, each prediction corresponds to a single observation (readmitted or not, in this example). The Prediction Distribution graph shows the overall distribution of the predictions for all observations in the selected data source.
Distribution selector (Y-Axis)¶
The Y-Axis distribution selector allows you to choose between showing the Prediction Distribution graph as a density or frequency curve.
Select one of the following from the Y-Axis dropdown:
The chart displays an equal area underneath both the positive and negative curves.
The area underneath each curve varies and is determined by the number of observations in each class.
The distribution curves are based on the data source and/or distribution selection. Alternating between Frequency and Density changes the curves but does not change the threshold or any values in the associated page elements.
You can select the data source for the ROC Curve tab visualizations. To do so, click the Data Selection dropdown above the Prediction Distribution graph and select Validation, Cross-Validation, or Holdout. The options available depend on whether you have run or enabled that partition.
Alternatively, you can base the graph on an external test dataset. Time-aware modeling allows backtest-based selections. If you change the data source, the Prediction Distribution graph updates, as well as the Chart, Matrix, and Metrics panes described in the following sections.
Changing the display threshold also changes the visualizations, as described below.
Experiment with the Prediction Distribution graph¶
Try the following changes and observe the results.
Pass your cursor over the Prediction Distribution graph. The threshold value displays in white text as you move your cursor.
For curves displayed in the Chart pane (a ROC curve shown here), DataRobot displays a circle that dynamically moves to correspond with the threshold value.
Click on the Prediction Distribution graph to select a new threshold value.
The new value appears in the Display Threshold field. The circle and intercept lines on the Prediction Distribution graph update to the new threshold value. The Metrics pane, the Chart pane (set to ROC Curve here), and the Matrix pane (set to Confusion matrix here) also update to reflect the new threshold.
Alternatively, you can change the threshold setting by typing a new value in the threshold field.
Click the Y-Axis dropdown to switch the prediction's distribution between displaying a Density or Frequency curve. This change does not impact other page elements.