Skip to content

アプリケーション内で をクリックすると、お使いのDataRobotバージョンに関する全プラットフォームドキュメントにアクセスできます。

Sliced insights

本機能の提供について

Sliced insights are off by default. この機能を有効にする方法については、DataRobotの担当者または管理者にお問い合わせください。

機能フラグ:スライスされたインサイトを有効にする

Sliced insights provide the option to view a subpopulation of a model's data based on feature values—either raw or derived. Slices are, in effect, a filter for categorical, numeric, or both types of features. Slices are applied to the Training, Validation, Cross-validation, or Holdout partitions, depending on the insight.

プロジェクトのデータのセグメントに基づいてインサイトを表示および比較すると、モデルがさまざまな部分母集団でどのように動作するかを理解することができます。 Use the segment-based accuracy information gleaned from sliced insights, or compare the segments to the "global" slice (all data), to improve training data, create individual models per segment, or augment predictions post-deployment.

Some common uses of sliced insights:

A bank is building a model to predict loan default risk and wants to understand if there are segments of their data— demographic information, location, etc.—that their model performs either more or less accurately on. If they find that "slicing" the data shows some segments perform to their expectations, they may choose to create individual projects per segment.

An advertising company wants to predict whether someone will click an ad. Their data contains multiple websites and they want to understand if the drivers are different between websites in their portfolio. They are interested in creating comparison groups, with each group consisting of some number of different values, to ultimately impact user behaviors in different ways for each site.

To view insights for a segment of your data once models are trained, choose the preconfigured slice from the Slice dropdown. If the slice has been calculated for the chosen insight, DataRobot will load the insight. Otherwise, a button will be available to start further calculations. Sliced insights are available for the following:

See also the sliced insight considerations.

Create a slice

You can create a slice to apply to insights from both the Data tab and from a supported Leaderboard insight. Each slice is made up of up to three filters (connected, as needed, by the AND operator).

備考

Features that can be used to create filters are based on all features, regardless of what is currently displayed on the Data tab or if you built a model using a list that doesn't include that feature. This is because while feature lists are based on columns, slices are based on rows. That is, the value of the selected feature appears in the row that is identified by the individual feature in the list feature.

Data tab slices

To create a slice from the Data tab, after EDA2 completes:

  1. From Project Data, click Slices to open a window that lists any configured slices.

  2. Click Add slice to open the filter configuration window. Fields are described in the filter reference.

  3. When configuration is complete, click Add. The Slices window appears, listing all configured slices as well as showing summary text of the filters that define the slice.

    ここでは次のこともできます。

    • Add a new slice.
    • Delete one or more configured slices.
    • Click Done to close the configuration window.

You can return to the Data page to create additional slices at any time. Or, create slices from a supported insight of a Leaderboard model.

Leaderboard slices

To create a slice from the Leaderboard:

  1. Select a model and open a supported insight. The insight loads using all data for the selected partition.

  2. Use the Slice dropdown, which defaults to None, to configure a new slice using the Edit slices option.

  3. Click Add slice to open the filter configuration window. Fields are described in the filter reference.

  4. Click Add to finish and view the Slices window (described in the Data section).

Filter configuration reference

The following table describes the fields in the filter configuration, which are common regardless of where in the application you build your filter. Note the special cases associated with configuring the filter before EDA2 completes.

Filter field 説明
スライス名 フィルターの名前を入力してください。 This is the name that will appear in the Slices dropdown of supported insights.
フィルタータイプ Select the categorical or numeric feature to base the filter on. They are grouped in the dropdown by variable type. You cannot set the target as the filter type.
オペレーター Set the filter operator to define what comprises the subpopulation. See also notes below the table.
in: Those rows in which the feature value falls within the range of the defined Value (categorical only).
=: Those rows in which the feature value is equivalent to the defined Value.
>: Those rows in which the feature value is greater than the defined Value (numeric only).
<: Those rows in which the feature value is less than the defined Value (numeric only).
Set the matching criteria for Filter type. For categorical features, all available values will be listed in the dropdown. For numerics, enter a value manually.

Adding conditions to a filter slice

Use Add filter to build a slice with multiple conditions. 以下の点に注意してください。

  • You can mix categorical and numeric features in a single slice.
  • All conditions are processed with the AND operator.

備考

If you select = as the operator, the Value must match exactly and you can choose only one value. If you set in, you can select multiple values.

Generate sliced insights

When you first load an insight, DataRobot displays results for all data in the appropriate partition (unless further calculations are required). This is the equivalent of the global slice, or as referenced in the dropdown, None.

備考

The following explanation does not apply to Feature Impact, which follows a separate calculation process.

When viewing sliced data for a given model, you only have to generate predictions once for a selected partition—Validation, Cross-validation, or if calculated, Holdout. Note that this calculation is in addition to the original calculation DataRobot ran when fitting models for the project.

Once prediction calculations are run for the first slice, DataRobot stores them so that they then can be re-used, assuming the same data partition. 以下に具体例を示します。

  • When you select a new slice for the first time, within the same insight, DataRobot will generate the insight but will not need to rerun predictions (because predictions for the partition have already been computed).

  • When you change to another supported insight (other than Feature Impact), the predictions are available and only the insight itself will need to be generated (because the partition's predictions have already been computed by another supported insight).

View a sliced insight

To view a sliced insight, choose the appropriate slice from the Slice dropdown. If you see a slice but are unsure of the filter conditions, click Edit Slices to open the slice configuration window, which provides summary text of the filters that define the slice.

The following example shows the ROC Curve tab without slices applied:

Consider the same model with a slice applied that segments the data for females aged 70-80 who have had more than five diagnoses:

備考

If the slice results in predictions that are either all positive or all negative, the ROC curve will be a straight line. The Confusion Matrix reports the same results in table form.

特徴量のインパクト

When using slices with Feature Impact, DataRobot first runs predictions on the training sample chosen to fit the model. As with other insights, these predictions can then be re-used. However, they must be run specifically for Feature Impact (that is, they cannot be leveraged from a previous prediction calculation on a different insight or from the model fitting step). Then, in parallel, DataRobot creates sliced-based synthetic prediction datasets and generates predictions for use in the Feature Impact insight from those predictions. DataRobot uses a sample size of 10 rows, by default, when a slice is selected.

The images below show Feature Impact with first the global slice and then a configured slice:

Hover on a feature to compare the calculated impact between sliced views:

注意事項

  • Sliced insights are only available for binary classification and regression projects that are non-time aware (no OTV or time series support).

  • It is best to define slices after EDA2 completes to avoid possible configuration errors. 次にいくつかの例を示します。

    • You cannot use the target for the Filter type. While the feature will not be listed once the target is set, if you create the slice before setting the target, the feature will be available for selection. Using it, however, will cause errors.
    • If you define a slice based on a feature, and that feature is highly correlated with a special column such as the target column or the partition column, it is possible to create a situation where the slice will have no rows in certain data subsets (for example, the slice has no rows in the Validation partition).
  • You cannot edit slices. Instead, delete (if desired) the existing slice and create a new slice with the desired criteria.

  • You can add a maximum of three filter conditions to a single slice.

  • If you create an invalid slice, the slice is created but when you apply it on supported insights it will error. This could happen, for example, if there are not enough rows on the sliced data to compute the insight or the filter is invalid. For example, if set num_procedures > 10 and the maximum number for any row is 6, DataRobot creates the slice but errors during the insight calculation if the slice is selected.

  • Row requirements:

    • Feature Impact: Minimum 10 rows, maximum 100,000
    • Other insights: Minimum 1 row (must fall within the slice), maximum set only by file size limits
  • For Feature Impact:

    • Slices calculates on all rows in the training data and then slices it for the requested number of rows. Previously, Feature Impact was calculated on the exact number of rows requested in row count.
    • The ability to adjust sample size is unavailable within a sliced view. Currently in the UI, the sample size used to calculate Feature Impact for slices is 10 rows. A different sample size can be requested through the API. This new sliced insight for the same slice, but with a different sample size will replace the 10 rows and will be shown on UI.

Supported insights


更新しました May 3, 2023
Back to top