Because of the complexity of many machine learning techniques, models can sometimes be difficult to interpret directly. The Feature Fit and Feature Effects insights provide model details on a per-feature basis.
Both Feature Fit and Feature Effects display similar insights. Feature Fit ranks features based on the importance score, while Feature Effects ranks features based on the feature impact score.
See the individual explanations for Feature Fit and Feature Effects to better understand the differences in the insights. Also, see below for information on interpreting the displays and the source of the values, noting the following:
- Both displays can be computed for numeric and categorical features. If you have a text-only dataset or model type, the tabs are greyed out.
- Feature Fit does not support multiclass projects.
- You must run the Feature Fit and/or Feature Effects computation for each model you are investigating.
- Depending on the model (the number of features and number of values for a feature), it may take several minutes for all features of the model to become available.
Feature Fit explained¶
When using Feature Fit, features are sorted in order of model-agnostic importance—that is, based on the Importance score, a univariate comparison, which is calculated during EDA2 (and displayed on the Data page). It answers the question, "for this feature of interest, where did my model do well or do poorly?" Clicking Compute Feature Fit causes DataRobot to run Feature Fit calculations, using the importance score to prioritize the order. By displaying results with higher scores—those features likely to be of more interest—first, you can more quickly view chart results without having to wait for all features to finish.
Because Importance scores are pre-modeling calculations, they are projections of which individual features might be important in the dataset, based on the chosen target. For a given model, a feature with a high Importance score might not be as impactful as projected, for example, if its signal is captured similarly by another feature.
You can then evaluate the fit of the model as a function of input by clicking through each value of a specific feature and comparing the model's predicted and actual target values.
Feature Fit can help identify if there are parts of your data where the model is systematically mis-predicting. If the insight shows larger differences between predicted and actual for a specific feature, it may suggest you need additional data to help explain the discrepancy.
Feature Effects explained¶
Feature Effects shows the effect of changes in the value of each feature on the model’s predictions. It displays a graph depicting how a model "understands" the relationship between each feature and the target, with the features sorted by Feature Impact. The insight is communicated in terms of partial dependence, which illustrates how a change in a feature's value, while keeping all other features as they were, impacts a model's predictions. Literally, "what is the feature's effect, how is this model using this feature?" To compare the model evaluation methods side by side:
- Feature Fit helps to evaluate the overall fit of a model, from the perspective of each feature.
- Feature Impact conveys the relative impact of each feature on a specific model.
- Feature Effects (with partial dependence) conveys how changes to the value of each feature change model predictions.
Clicking Compute Feature Effects causes DataRobot to first compute Feature Impact (if not already computed for the project) and then run the Feature Effects calculations for the model:
The following table describes the elements of the displays:
|Search for features||Lists of the top features that have more than zero-influence on the model, based on the feature importance score (Feature Fit) or Feature Impact (Feature Effects) score.|
|Score Feature Fit||Displays a visual indicator of the importance of the feature in predicting the target variable. This is the value displayed on the Data page in the Importance column.|
|Score Feature Effects||Reports the relevance to the target feature. This is the value displayed in the Feature Impact display.|
|Target range||Displays the value range for the target; the Y-axis values can be adjusted with the scaling option.|
|Feature values||Displays individual values of the selected feature.|
|Feature values tooltip||Provides summary information for a feature's binned values.|
|Feature value count||Sets, for the selected feature, the feature distribution for the selected partition fold.|
|Display controls||Sets filters that control the values plotted in the display (partial dependence, predicted, and/or actual).|
|Sort by||Provides controls for sorting.|
|Bins||For qualifying feature types, sets the binning resolution for the feature value count display.|
|Data Selection||Controls which partition fold is used as 1) the basis of the Predicted and Actual values and 2) the sample used for the computation of Partial Dependence. Options for OTV projects differ slightly.|
|Select Class (multiclass only)||Provides controls to display graphed results for a particular class within the target feature.|
|Export/More||Export provides options for downloading data. More controls whether to display missing values and change the Y-axis scale.|
See below for more information on how DataRobot calculates values, explanation of tips for using the displays, and how Exposure and Weight change output.
List of features¶
To the left of the graph, DataRobot displays a list of the top 500 predictors, sorted by feature importance (Feature Fit) or feature impact (Feature Effects) score. Use the arrow keys or scroll bar to scroll through features, or the search field to find by name. If all the sample rows are empty for a given feature, the feature is not available in the list. Selecting a feature in the list updates the display to reflect results for that feature.
For charts calculated prior to v4.2, DataRobot displays a warning symbol if the partial dependence calculation is not using the full 1000 row sample (due to missing or incorrect values, for example). To decide whether or not the feature information is reliable enough, mouse over the the symbol to see the data percentage used.
Feature Effects score¶
Each feature in the list is accompanied by its feature impact score. Feature impact measures, for each of the top 500 features, the importance of one feature on the target prediction. It is estimated by calculating the prediction difference before and after shuffling the selected rows of one feature (while leaving other columns unchanged). DataRobot normalizes the scores so that the value of the most important column is 1 (100%). A score of 0% indicates that there was no calculated relationship.
Target range (Y-axis)¶
The Y-axis represents the value range for the target variable. For binary classification and regression problems, this is a value between 0 and 1. For non-binary projects, the axis displays from min to max values. Note that you can use the scaling feature to change the Y-axis and bring greater focus to the display.
Feature values (X-axis)¶
For numeric features¶
The logic for a numeric feature depends on whether you are displaying Predicted/Actual or Partial Dependence.
If the value count in the selected partition fold is greater than 20, DataRobot bins the values based on their distribution in the fold and computes Predicted and Actual for each bin.
If the value count is 20 or less, DataRobot plots Predicted/Actuals for the top values present in the fold selected.
Partial Dependence logic¶
If the value count of the feature in the entire dataset is greater than 99, DataRobot computes Partial Dependence on the percentiles of the distribution of the feature in the entire dataset.
If the value count is 99 or less, DataRobot computes Partial Dependence on all values in the dataset (excluding outliers).
Feature Fit: DataRobot bins the values for the computation of Predicted/Actual. The X-axis may additionally display a
==Missing==bin, which contains all rows with missing feature values (i.e., NaN as the value of one of the features).
Feature Effect: Partial Dependence feature values are derived from the percentiles of the distribution of the feature across the entire data set. The X-axis may additionally display a
==Missing==bin, which contains the effect of missing values. Partial Dependence calculation always includes "missing values," even if the feature is not missing throughout data set. The display shows what would be the average predictions if the feature were missing—DataRobot doesn't need the feature to actually be missing, it's just a "what if."
For categorical features¶
For categorical, the X-axis displays the 25 most frequent values for Predicted, Actual, and Partial Dependence in the selected partition fold. The categories can include, as applicable:
=All Other=: For categorical features, a single bin containing all values other than the 25 most frequent values. No partial dependence is computed for
=All Other=. DataRobot uses one-hot encoding and ordinal encoding preprocessing tasks to automatically group low-frequency levels.
For both tasks you can use the the
min_supportadvance tuning parameter to group low-frequency values. By default, DataRobot uses a value of 10 for the one-hot encoder and 5 for the ordinal encoder. In other words, any category that has fewer than 10 levels (one-hot encoder) or 5 (ordinal encoder) is combined into 1 group.
==Missing==: A single bin containing all rows with missing feature values (that is, NaN as the value of one of the features).
==Other Unseen==: A single bin containing all values that were not present in the Training set. No partial dependence is computed for
=Other Unseen=. See the explanation below for more information.
Feature value tooltip¶
For each bin, to display a feature's calculated values and row count, hover in the display area above the bin. For example, this tooltip:
For the feature
number diagnoses when the value is
7, the partial dependence average was
0.366, the predicted average was
0.381, and the actual values average was
0.3. These averages were calculated from
20 rows in the dataset (in which the number of diagnoses was seven).
Feature value count¶
The bar graph below the X-axis provides a visual indicator, for the selected feature, of each of the feature's value frequencies. The bars are mapped to the feature values listed above them, and so changing the sort order also changes the bar display. This is the same information as that presented in the Frequent Values chart on the Data page. For qualifying feature types, you can use the Bins dropdown to set the number of bars (determine the binning).
The legend at the top of the display provides check boxes that control the display of plotted data. Actual values are represented by open orange circles, predicted valued by blue crosses, and partial dependence points by solid yellow circles. In this way, points lie on top without blocking view of each other. Check or uncheck the boxes to focus on a particular aspect of the display. See below for information on how DataRobot calculates and displays the values.
The Sort by dropdown provides sorting options for plot data. For categorical features, you can sort alphabetically, by frequency, or by size of the effect (partial dependence). For numeric features, sort is always numeric.
Set the number of bins¶
The Bins setting allows you to set the binning resolution for the display. This option is only available when the selected feature is a numeric or continuous variable; it is not available for categorical features or numeric features with low unique values. Use the feature value tooltip to view bin statistics.
Select the partition fold¶
You can set the partition fold used for Predicted, Actual, and Partial Dependence value plotting with the Data Selection dropdown—Training, Validation, and, if unlocked, Holdout. While it may not be immediately obvious, there are good reasons to investigate the training dataset results.
When you select a partition fold, that selection applies to all three display controls, whether or not the control is checked. Note, however, that while performed on the same partition fold, the partial dependence calculation uses a different range of the data.
Note that Data Selection options differ depending on whether or not you are investigating a time-aware project:
For non-time-aware projects: In all cases you can select the Training or Validation set; if you have unlocked holdout, you also have an option to select the Holdout partition.
For time-aware projects: For time-aware projects, you can select Training, Validation, and/or Holdout (if available) as well as a specific backtest. See the section on time-aware Data Selection settings for details.
Select the class (multiclass only)¶
In a multiclass project, you can additionally set the display to chart per-class results for each feature in your dataset.
By default, DataRobot calculates effects for the top 10 features. To view per-class results for features ranked lower than 10, click Compute next to the feature name:
The Export button allows you to export the graphs and data associated with the model's details and for individual features. If you choose to export a ZIP file, you will get all of the chart images and the CSV files for partial dependence and predicted vs actual data.
The Feature Fit and Feature Effects insights provide tools for re-displaying the chart to help you focus on areas of importance.
This option is only available when one of the following conditions is met: there are missing values in the dataset, the chart's access is scalable, the project is binary classification.
Click the gear setting to view the choices:
Check or uncheck the following boxes to activate:
Show Missing Values: Shows or hides the effect of missing values. This selection is available for numeric features only. The bin corresponding to missing values is labeled as =Missing=.
Auto-scale Y-axis: Resets the Y-axis range, which is then used to chart the actual data, the prediction, and the partial dependence values. When checked (the default), the values on the axis span the highest and lowest values of the target feature. When unchecked, the scale spans the entire eligible range (for example, 0 through 1 for binary projects).
Log X-Axis: Toggles between the different X-axis representations. This selection is available for highly skewed (distribution where one of tail is longer than the other) with numeric features having values greater than zero.
The following sections describe:
- How DataRobot calculates average values and partial dependence
- Interpreting the displays
- Time-aware Data Selection
- Understanding unseen values
- How Exposure and Weight change output
Average value calculations¶
For the predicted and actual values in the display, DataRobot plots the average values. The following simple example explains the calculation.
In the following dataset, Feature A has two possible values—1 and 2:
|Feature A||Feature B||Target|
In this fictitious dataset, the X axis would show two values: 1 and 2. When target value A=1, DataRobot calculates the average as 4+6+1 / 3. When A=2, the average is 5+8+2 / 3. So the actual and predicted points on the graph show the average target for each aggregated feature value.
- For numeric features, DataRobot generates bins based on the feature domain. For example, for the feature
Agewith a range of 16-101, bins (the user selects the number) would be based on that range.
- For categorical features, for example
Gender, DataRobot generates bins based on the top unique values (perhaps 3 bins—
DataRobot then calculates the average values of prediction in each bin and the average of the actual values of each bin.
Interpret the displays¶
In the Feature Effects and Feature Fit displays, categorical features are represented as points; numerical features are represented as connected points. This is because each numerical value can be seen in relation to the other values, while categorical features are not linearly related. A dotted line indicates that there were not enough values to plot.
Consider the following Feature Effects display (calculations are the same for Feature Fit):
The orange open circles depict, for the selected feature, the average target value for the aggregated number_diagnoses feature values. In other words, when the target is readmitted and the selected feature is number_diagnoses, a patient with two diagnoses has, on average, a roughly 23% chance of being readmitted. Patients with three diagnoses have, on average, a roughly 35% chance of readmittance.
The blue crosses depict, for the selected feature, the average prediction for a specific value. From the graph you can see that DataRobot averaged the predicted feature values and calculated a 25% chance of readmittance when number_diagnoses is two. Comparing the actual and predicted lines can identify segments where model predictions differ from observed data. This typically occurs when the segment size is small. In those cases, for example, some models may predict closer to the overall average.
The yellow partial dependence line depicts the marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction. The value of the feature of interest is then reassigned to each possible value, calculating the average predictions for the sample at each setting. (From the simple example above, DataRobot calculates the average results when all 1000 rows use value 1 and then again when all 1000 rows use value 2.) These values help determine how the value of each feature affects the target. The shape of the yellow line "describes" the model’s view of the marginal relationship between the selected feature and the target. See the discussion of partial dependence calculation for more information.
Tips for using the displays:
To evaluate model accuracy, uncheck the partial dependence box. You are left with a visual indicator that charts actual values against the model's predicted values.
To understand partial dependence, uncheck the actual and predicted boxes. Set the sort order to Effect Size. Consider the partial dependence line carefully. Isolating the effect of important features can be very useful in optimizing outcomes in business scenarios.
If there are not enough observations in the sample at a particular level, the partial dependency computation may be missing for a specific feature value.
A dashed instead of solid predicted (blue) and actual (orange) line indicates that there are no rows in the bins created at the point in the chart.
For numeric variables, if there are more than 18 values, DataRobot calculates partial dependence on values derived from the percentiles of the distribution of the feature across the entire data set. As a result, the value is not displayed in the hover tooltip.
Training data as the viewing subset¶
Viewing Feature Fit or Feature Effect for training data provides a few benefits. It helps to determine how well a trained model fits the data it used for training. It also lets you compare the difference between seen and unseen data in the model performance. In other words, viewing the training results is a way to check the model against known values. If the predicted vs the actual results from the training set are weak, it is a sign that the model is not appropriately selected for the data.
When considering partial dependence, using training data means the values are calculated based on training samples and compared against the maximum possible feature domain. It provides the option to check the relationship between a single feature (by removing marginal effects from other features) and the target across the entire range of the data. For example, suppose the validation set covers January through June but you want to see partial dependence in December. Without that month's data in validation, you wouldn't be able to. However, by setting the data selection subset to Training, you could see the effect.
Partial dependence calculations¶
Predicted/Actual and Partial Dependence are computed very differently for continuous data. The calculations for Predicted/Actual that bins the data, for example, (1-40], (40-50]... are created to result in sufficient material for computing averages. DataRobot then bins the values based on the distribution of the feature for the selected partition fold.
Partial dependence, on the other hand, uses single values (e.g., 1, 5, 10, 20, 40, 42, 45...) that are percentiles of the distribution of the feature across the entire data set. It uses up to 1000-row samples to determine the scale of the curve. To make the scale comparable with Predicted/Actual, the 1000 samples are drawn from the data of the selected fold. In other words, partial dependence is calculated for the maximum possible range of values from the entire dataset but scaled based on the Data Selection fold setting.
For example, consider a feature "year." For Partial Dependence, DataRobot computes values based on all the years in the data. For Actual/Predicted, computation is based on the years in the selected fold. If the dataset dates range from 2001-01-01 to 2010-01-01, DataRobot uses that span for partial dependence calculations. Predicted and Actual calculations, in contrast, contain only the data from the corresponding, selected fold/backtest. You can see this difference when viewing all three control displays for a selected fold:
Data selection for time-aware projects¶
When working with time-aware projects, because of the backtests the Data Selection dropdown works a bit differently. Select the Feature Fit or Feature Effects tab for your model of interest. If you haven't already computed values for the tab, you are prompted to compute for Backtest 1 (Validation).
When DataRobot completes the calculations, the insight displays with the following Data Selection setting:
The results of clicking on the backtest name depend on whether backtesting has been run for the model. DataRobot automatically computes backtests for the highest scoring models; for lower-scoring models, you must select Run from the Leaderboard to initiate backtesting:
For comparison, the following illustrates when backtests have not been run and when they have:
When calculations are complete, you must then run Feature Fit or Feature Effect calculations for each backtest you want to display, as well as for the Holdout fold, if applicable. From the dropdown, click a backtest that is not yet computed and DataRobot provides a button to initiate calculations.
Set the partition fold¶
Once backtest calculations are complete for your needs, use the Data Selection control to choose the backtest and partition for display. The available partition folds are dependent on the backtest:
- For a numbered backtests: Validation and Training for each calculated backtest
- For the Holdout Fold: Holdout and Training
Click the down arrow to open the dialog and select a partition:
Or, click the right and left arrows to move through the options for the currently selected partition—Validation or Training—plus Holdout. If you move to an option that has yet to be computed, DataRobot provides a button to initiate the calculation:
Binning and top values¶
By default, DataRobot calculates the top features listed in Feature Fit and Feature Effects using the Training dataset. For categorical feature values, displayed as discrete points on the X-axis, the segmentation is affected if you select a different data source. To understand the segmentation, consider the illustration below and the table describing the segments:
|As illustrated in chart||Label in chart||Description|
|Top-N values||<feature_value>||Values for the selected feature, with a maximum of 20 values. For any feature with more than 10 values, DataRobot further filters the results, as described in the example below.|
||A single bin containing all values other than the Top-N most frequent values.|
||A single bin containing all records with missing feature values (that is, NaN as the value of one of the features).|
||Categorical feature values that were not "seen" in the Training set but qualified as Top-N in Validation and/or Holdout.|
||Categorical feature values that were not "seen" in the Training set and did not qualify as Top-N in Validation and/or Holdout.|
A simple example to explain Top-N:
Consider a dataset with categorical feature
Population and a world population of 100. DataRobot calculates Top-N as follows:
- Ranks countries by their population.
- Selects up to the top-20 countries with the highest population.
- In cases with more than 10 values, DataRobot further filters the results so that accumulative frequency is >95%. In other words, DataRobot displays in the X-axis those countries where their accumulated population hits 95% of the world population.
A simple example to explain Unseen:
Consider a dataset with the categorical feature
Letters. The complete list of values for
Letters is A, B, C, D, E, F, G, H. After filtering, DataRobot determines that Top-N equals three values. Note that, because the feature is categorical, there is no
|Fold/set||Values found||Top-3 values||X-axis values|
|Training set||A, B, C, D||A, B, C||A, B, C,
|Validation set||B, C, F, G+||B, C, F*||B, C, F (unseen),
|Holdout set||C, E, F, H+||C, E, F||C, E (unseen), F (unseen),
* A new value in the top 3 but not present in the Training set, flagged as
+ A new value not present in Training or in top-3, flagged as
How Exposure changes output¶
If you used the Exposure parameter when building models for the project, the Feature Fit and Feature Effects tabs display the graph adjusted to exposure. In this case:
The orange line depicts the sum of the target divided by the sum of exposure for a specific value. The label and tooltip display Sum of Actual/Sum of Exposure, which indicates that exposure was used during model building.
The blue line depicts the sum of predictions divided by the sum of exposure and the legend label displays Sum of Predicted/Sum of Exposure.
The marginal effect depicted in the yellow partial dependence is divided by the sum of exposure of the 1000-row sample. This adjustment is useful in insurance, for example, to understand the relationship between annualized cost of a policy and the predictors. The label tooltip displays Average partial dependency adjusted by exposure.
How Weight changes output¶
If you set the Weight parameter for the project, DataRobot weights the average and sum operations as described above.