Segmented analysis¶
Segmented analysis identifies operational issues with training and prediction data requests for a deployment. DataRobot enables the drill-down analysis of data drift and accuracy statistics by filtering them into unique segment attributes and values.
Reference the guidelines below to understand how to configure, view, and apply segmented analysis.
Configure segmented analysis¶
To use segmented analysis for service health, data drift, and accuracy, you must enable the following deployment settings:
-
Enable target monitoring (required to enable data drift and accuracy tracking)
-
Enable feature drift tracking (required to enable data drift tracking)
-
Track attributes for segmented analysis of training data and predictions (required to enable segmented analysis for service health, data drift, and accuracy)
Note
Only the deployment owner can configure these settings.
View segmented analysis¶
If you have enabled segmented analysis for your deployment and have made predictions, you can access various statistics by segment. By default, statistics for a deployment are displayed without any segmentation.
There are two dropdown menus used for segment analysis: Segment Attribute and Segment Value.
Service health¶
Segmented analysis for service health uses fixed segment attributes for every deployment. The segment attributes represent the different ways in which prediction requests can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:
Segment Attribute | Description | Segment Value | Example |
---|---|---|---|
DataRobot-Consumer | Segments prediction requests by the users of a deployment that have made prediction requests. | Each segment value is the email address of a user. | Segment Attribute: DataRobot-Consumer Value: nate@datarobot.com |
DataRobot-Host-IP | Segments prediction requests by the IP address of the prediction server used to make prediction requests. | Each segment value is a unique IP address. | Segment Attribute: DataRobot-Host-IP Value: 168.212. 226.204 |
DataRobot-Remote-IP | Segments prediction requests by the IP address of a caller (the machine used to make prediction requests). | Each segment value is a unique IP address. | Segment Attribute: DataRobot-Remote-IP Value: 63.211. 546.231 |
Select a segment attribute, then select a segment value for that attribute. When both are selected, the service health tab automatically refreshes to display the statistics for the selected segment value.
Segment availability
The segment values that appear in the Segment Value dropdown menu are not dependent on the selected time range, monitoring type, or model ID.
Data drift and accuracy¶
Segmented analysis for data drift and accuracy allows for custom attributes in addition to fixed attributes for every deployment. The segment attributes represent the different ways in which the data can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:
Segment Attribute | Description | Segment Value | Example |
---|---|---|---|
DataRobot-Consumer | Segments prediction requests by the users of a deployment that have made prediction requests. | Each segment value is the email address of a user. | Segment Attribute: DataRobot-Consumer Value: nate@datarobot.com |
Custom attribute | Segments based on a column in the training data that is indicated when configuring segmented analysis. For example, if your training data includes a "Country" column, you could select it as a custom attribute and segment the data by individual countries (which make up the segment values for the custom attribute). | Based on the segment attribute you provide. | Segment Attribute: "Country" Value: "Spain" |
None | Displays the data drift statistics without any segmentation. | All (no segmentation applied). | N/A |
Select a segment attribute, and then select a segment value for that attribute. When both are selected, the Data Drift tab automatically refreshes to display the statistics for the selected segment value.
Segment availability
The segment values that appear in the Segment Value dropdown menu are not dependent on the selected time range, monitoring type, or model ID.
Apply segmented analysis¶
An example use case for segment analysis is determining the source of the data error rate for a deployment. For example, this deployment, without segmentation, displays an error rate of 14.39% for the given time range:
Segment analysis helps to understand where an error rate is coming from. For example, selecting "DataRobot-Consumer" from the Segment Attribute dropdown shows the Data Error Rate for the prediction requests made by individual users for a specified time window. Selecting an individual user from the Segment Value dropdown shows service health statistics for their segment of the prediction requests.
In this case, by selecting the user john.bledsoe@datarobot.com, the statistics refresh to display this user's stats. He made 25,000 predictions over 250 requests, with an error rate of 0%:
You can interpret this to mean that the user did not contribute to the overall error rate for this deployment. However, selecting a different user making predictions requests for this deployment shows that they made 1010 predictions over 160 requests, with an error rate of 36.875%:
The information gathered from segment analysis clearly indicates where a deployment's error rate is coming from, allowing the admin to contact the user contributing the erroneous data and rectify any issues.