Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Segmented analysis

Segmented analysis identifies operational issues with training and prediction data requests for a deployment. DataRobot enables the drill-down analysis of data drift and accuracy statistics by filtering them into unique segment attributes and values.

Reference the guidelines below to understand how to configure, view, and apply segmented analysis.

Configure segmented analysis

To enable segmented analysis for an existing deployment, navigate to the Settings > Data tab (deploy-segment analysis can also be configured for new deployments during creation). Turn on the following toggles under the Inference > Data Drift header. Note that these toggles can only be set by the deployment owner.

  • Enable data drift tracking
  • Enable prediction rows storage
  • Track attributes for segmented analysis of training data and predictions

For time series deployments with segmented analysis enabled, DataRobot automatically adds up to two segmented attributes: Forecast Distance and series id (the ID is only provided for multiseries models). These attributes allow you to view accuracy and drift for a specific forecast distance or series.

After enabling the toggles, you must specify the segment attributes to track in training and prediction data prior to making predictions. Selecting a segment attribute for tracking causes the model's data to be segmented by the attribute, allowing users to closely analyze the segment values that make up the attributes selected for tracking. Attributes used for segmented analysis must be present in the training dataset for a deployed model, but they need not be features of the model.

The list of segment attributes available for tracking is limited to categorical features, with the exception of the selected series ID used by multiseries deployments. To track an attribute, list it in the Track attributes for segmented analysis of training data and predictions field.

The "Consumer" attribute (representing users making prediction requests) is always listed by default. In the case of time series deployments, forecast distance will be automatically available as a segment attribute without being explicitly present in the training dataset. Forecast distance is inferred based on the forecast point and the date being predicted on. When you have finalized the attributes to track, click Save changes.

Make predictions and navigate to the tab you want to analyze for your deployment by segment: Service Health, Data Drift, or Accuracy. Note that segmented analysis is only available for predictions made after the toggled is turned on.

View segmented analysis

If you have enabled segmented analysis for your deployment and have made predictions, you can access various statistics by segment. By default, statistics for a deployment are displayed without any segmentation.

There are two dropdown menus used for segment analysis: Segment Attribute and Segment Value.

Service health

Segmented analysis for service health uses fixed segment attributes for every deployment. The segment attributes represent the different ways in which prediction requests can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:

Segment Attribute Description Segment Value Example
DataRobot-Consumer Segments prediction requests by the users of a deployment that have made prediction requests. Each segment value is the email address of a user. Segment Attribute: DataRobot-Consumer Value: nate@datarobot.com
DataRobot-Host-IP Segments prediction requests by the IP address of the prediction server used to make prediction requests. Each segment value is a unique IP address. Segment Attribute: DataRobot-Host-IP Value: 168.212. 226.204
DataRobot-Remote-IP Segments prediction requests by the IP address of a caller (the machine used to make prediction requests). Each segment value is a unique IP address. Segment Attribute: DataRobot-Remote-IP Value: 63.211. 546.231

Select a segment attribute, and then select a segment value for that attribute. When both are selected, the service health tab automatically refreshes to display the statistics for the selected segment value.

Note that the segment values that appear are tied to the specified time range. If a user only contributed prediction requests outside the specified time range, that user does not appear as a selectable segment value in the dropdown menu.

Data drift and accuracy

Segmented analysis for data drift and accuracy allows for custom attributes in addition to fixed attributes for every deployment. The segment attributes represent the different ways in which the data can be viewed. Segment value is a single value of the selected segment attribute present in one or more prediction requests. They are represented by different values depending on the segment attribute applied:

Segment Attribute Description Segment Value Example
DataRobot-Consumer Segments prediction requests by the users of a deployment that have made prediction requests. Each segment value is the email address of a user. Segment Attribute: DataRobot-Consumer Value: nate@datarobot.com
Custom attribute Segments based on a column in the training data that is indicated when configuring segmented analysis. For example, if your training data includes a "Country" column, you could select it as a custom attribute and segment the data by individual countries (which make up the segment values for the custom attribute). Based on the segment attribute you provide. Segment Attribute: "Country" Value: "Spain"
None Displays the data drift statistics without any segmentation. All (no segmentation applied). N/A

Select a segment attribute, and then select a segment value for that attribute. When both are selected, the Data Drift tab automatically refreshes to display the statistics for the selected segment value.

Note that the segment values that appear are tied to the specified time range. If a tracked segment attribute or value was present only in prediction requests outside the specified time range, that attribute or value does not appear in the dropdown menu.

Apply segmented analysis

An example use case for segment analysis is determining the source of the data error rate for a deployment. For example, this deployment, without segmentation, displays an error rate of 14.39% for the given time range:

Segment analysis helps to understand where an error rate is coming from. For example, selecting "DataRobot-Consumer" from the Segment Attribute dropdown shows the Data Error Rate for the prediction requests made by individual users for a specified time window. Selecting an individual user from the Segment Value dropdown shows service health statistics for their segment of the prediction requests.

In this case, by selecting the user john.bledsoe@datarobot.com, the statistics refresh to display this user's stats. He made 25,000 predictions over 250 requests, with an error rate of 0%:

You can interpret this to mean that the user did not contribute to the overall error rate for this deployment. However, selecting a different user making predictions requests for this deployment shows that they made 1010 predictions over 160 requests, with an error rate of 36.875%:

The information gathered from segment analysis clearly indicates where a deployment's error rate is coming from, allowing the admin to contact the user contributing the erroneous data and rectify any issues.


Updated November 5, 2021
Back to top