Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Segmented analysis

Segmented analysis identifies operational issues with training and prediction data requests for a deployment. DataRobot enables the drill-down analysis of data drift and accuracy statistics by filtering them into unique segment attributes and values.

Reference the guidelines below to understand how to configure, view, and apply segmented analysis.

Configure segmented analysis

To use segmented analysis for service health, data drift, and accuracy, you must enable the following deployment settings:

Note

Only the deployment owner can configure these settings.

View segmented analysis

If you have enabled segmented analysis for your deployment and have made predictions, you can access various statistics by segment. By default, statistics for a deployment are displayed without any segmentation.

There are two dropdown menus used for segment analysis: Segment Attribute and Segment Value.

Service health

Segmented analysis for service health uses fixed segment attributes for every deployment. The segment attributes represent the different ways in which prediction requests can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:

Segment Attribute Description Segment Value Example
DataRobot-Consumer Segments prediction requests by the users of a deployment that have made prediction requests. Each segment value is the email address of a user. Segment Attribute: DataRobot-Consumer
Value: nate@datarobot.com
DataRobot-Host-IP Segments prediction requests by the IP address of the prediction server used to make prediction requests. Each segment value is a unique IP address. Segment Attribute: DataRobot-Host-IP
Value: 168.212. 226.204
DataRobot-Remote-IP Segments prediction requests by the IP address of a caller (the machine used to make prediction requests). Each segment value is a unique IP address. Segment Attribute: DataRobot-Remote-IP
Value: 63.211. 546.231

Select a segment attribute, then select a segment value for that attribute. When both are selected, the service health tab automatically refreshes to display the statistics for the selected segment value.

Note

The segment values that appear are tied to the specified time range. If a user only contributed prediction requests outside the specified time range, that user does not appear as a selectable segment value in the dropdown menu.

Data drift and accuracy

Segmented analysis for data drift and accuracy allows for custom attributes in addition to fixed attributes for every deployment. The segment attributes represent the different ways in which the data can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:

Segment Attribute Description Segment Value Example
DataRobot-Consumer Segments prediction requests by the users of a deployment that have made prediction requests. Each segment value is the email address of a user. Segment Attribute: DataRobot-Consumer
Value: nate@datarobot.com
Custom attribute Segments based on a column in the training data that is indicated when configuring segmented analysis. For example, if your training data includes a "Country" column, you could select it as a custom attribute and segment the data by individual countries (which make up the segment values for the custom attribute). Based on the segment attribute you provide. Segment Attribute: "Country"
Value: "Spain"
None Displays the data drift statistics without any segmentation. All (no segmentation applied). N/A

Select a segment attribute, and then select a segment value for that attribute. When both are selected, the Data Drift tab automatically refreshes to display the statistics for the selected segment value.

Note that the segment values that appear are tied to the specified time range. If a tracked segment attribute or value was present only in prediction requests outside the specified time range, that attribute or value does not appear in the dropdown menu.

Apply segmented analysis

An example use case for segment analysis is determining the source of the data error rate for a deployment. For example, this deployment, without segmentation, displays an error rate of 14.39% for the given time range:

Segment analysis helps to understand where an error rate is coming from. For example, selecting "DataRobot-Consumer" from the Segment Attribute dropdown shows the Data Error Rate for the prediction requests made by individual users for a specified time window. Selecting an individual user from the Segment Value dropdown shows service health statistics for their segment of the prediction requests.

In this case, by selecting the user john.bledsoe@datarobot.com, the statistics refresh to display this user's stats. He made 25,000 predictions over 250 requests, with an error rate of 0%:

You can interpret this to mean that the user did not contribute to the overall error rate for this deployment. However, selecting a different user making predictions requests for this deployment shows that they made 1010 predictions over 160 requests, with an error rate of 36.875%:

The information gathered from segment analysis clearly indicates where a deployment's error rate is coming from, allowing the admin to contact the user contributing the erroneous data and rectify any issues.


Updated April 10, 2023