SHAP reference¶
SHAP is an open-source algorithm used to address the accuracy vs. explainability dilemma. SHAP (SHapley Additive exPlanations) is based on Shapley Values, the coalitional game theory framework by Lloyd Shapley, Nobel Prize-winning economist. Shapley asked:
How should we divide a payout among a cooperating team whose members made different contributions?
Shapley values answers:
- The Shapley value for member X is the amount of credit they get.
- For every subteam, how much marginal value does member X add when they join the subteam? Shapley value is the weighted mean of this marginal value.
- Total payout is the sum of Shapley values over members.
Scott Lundberg is the primary author of the SHAP python package, providing a programmatic way to explain predictions:
We can divide credit for model predictions among features!
By assuming that each value of a feature is a “player” in a game, the prediction is the payout. SHAP explains how to fairly distribute the “payout” among features.
SHAP has become increasing popular due to the SHAP open source package that developed:
- A high-speed exact algorithm for tree ensemble methods (called "TreeExplainer").
- A high-speed approximation algorithm for deep learning models (called "DeepExplainer").
- A model-agnostic algorithm to estimate Shapley values for any model (called "KernelExplainer").
The following key properties of SHAP make it particularly suitable for DataRobot machine learning:
- Local accuracy: The sum of the feature attributions is equal to the output of the model DataRobot is "explaining."
- Missingness: Features that are already missing have no impact.
- Consistency: Changing a model to make a feature more important to the model will never decrease the SHAP attribution assigned to that feature. (For example, model A uses feature X. You then make a new model, B, that uses feature X more heavily (perhaps by doubling the coefficient for that feature and keeping everything else the same). Because of the consistency quality of SHAP, the SHAP importance for feature X in model B is at least as high as it was for feature X in model A.)
Additional readings are listed below.
SHAP contributes to model explainability by:
-
Feature Impact: SHAP shows, at a high level, which features are driving model decisions. Without SHAP, results are sensitive to sample size and can change when re-computed unless the sample is quite large. See the deep dive.
-
Prediction Explanations: There are certain types of data that don't lend themselves to producing results for all columns. This is especially problematic in regulated industries like banking and insurance. SHAP explanations reveal how much each feature is responsible for a given prediction being different from the average. For example, when a real estate record is predicted to sell for $X, SHAP Prediction Explanations illustrate how much each feature contributes to that price. See the deep dive.
Note
To retrieve the SHAP-based Feature Impact or Prediction Explanations visualizations, you must enable the Include only models with SHAP value support advanced option prior to model building.
- Feature Effects: SHAP does not change the Feature Effects results. The Predicted, Actual, and Partial dependence plots do not use SHAP in any way. However, the bar chart on the left is ordered by SHAP Feature Impact instead of the usual Permutation Feature Impact.
Feature Impact¶
Feature Impact assigns importance to each feature (j
) used by a model.
With SHAP¶
Given a model and some observations (up to 5000 rows in the training data), Feature Impact for each feature j
is computed as:
sample average of abs(shap_values for feature j)
Normalize values such that the top feature has impact of 100%.
With permutation¶
Given a model and some observations (2500 by default and up to 100,000), calculate the metric for the model based on the actual data. For each column j
:
- Permute the values of column
j
. - Calculate metrics on permuted data.
- Importance =
metric_actual - metric_perm
Optionally, normalize by the largest resulting value.
Prediction Explanations¶
SHAP Prediction Explanations are additive. The sum of SHAP values is exactly equal to:
[prediction - average(prediction)]
When selecting between XEMP and SHAP, consider your need for accuracy versus interpretability and performance. With XEMP, because all blueprints are included in Autopilot, the results may produce slightly higher accuracy. This is only true in some cases, however, since SHAP supports all key blueprints meaning that often times accuracy is the same. SHAP does provide higher interpretability and performance:
- results are intuitive
- it’s computed for all features
- results often return 5-20 times faster
- it's additive
- open source nature provides transparency
Additivity in Prediction Explanations¶
In certain cases, you may notice that SHAP values do not add up to the prediction. This is because SHAP values are additive in the units of the direct model output, which can be different from the units of prediction for several reasons.
-
For most binary classification problems, the SHAP values correspond to a scale that is different from the probability space [0,1]. This is due to the way that these algorithms map their direct outputs
y
to something always between 0 and 1, most commonly using a nonlinear function like the logistic functionprob = logistic(y)
. (In technical terms, the model's "link function" islogit(p)
, which is the inverse oflogistic(y)
.) In this common situation, the SHAP values are additive in the pre-link "margin space", not in the final probability space. This meanssum(shap_values) = logit(prob) - logit(prob_0)
, whereprob_0
is the training average of the model's predictions. -
Regression problems with a skewed target may use the natural logarithm
log()
as a link function in a similar way. -
The model may have specified an offset (applied before the link) and/or an exposure (applied after the link).
-
The model may "cap" or "censor" its predictions (for example, enforcing them to be non-negative).
The following pseudocode can be used for verifying additivity in these cases.
# shap_values = output from SHAP prediction explanations
# If you obtained the base_value from the UI prediction distribution chart, first transform it by the link.
base_value = api_shap_base_value or link_function(ui_shap_base_value)
pred = base_value + sum(shap_values)
if offset is not None:
pred += offset
if link_function == 'log':
pred = exp(pred)
elif link_function == 'logit:
pred = exp(pred) / (1 + exp(pred))
if exposure is not None:
pred *= exposure
pred = predictions_capping(pred)
# at this point, pred matches the prediction output from the model
Open-source additivity warning¶
There is a known (though rare) issue in the interaction of the SHAP and XGBoost libraries that can cause SHAP to add to a slightly incorrect value. Most XGBoost models produce SHAP values that obey additivity, verified by an automatic check. See examples reported on the SHAP GitHub page:
- Additivity check failed in TreeExplainer!
- Shap values do not sum to model output
- Tree explainer error in latest version
Note
Log in to GitHub before accessing these GitHub resources.
In DataRobot, if additivity is violated by less than 1% (normalized to model predictions), the application provides a warning and provides the SHAP values. If failure is larger than 1%, an error is returned, and the SHAP values, which are potentially wrong, are not provided.
There is a known issue in the interaction of SHAP and Keras models with certain activation functions, including SELU and Swish, which can cause SHAP values to fail additivity significantly. If this failure occurs in Keras models, the SHAP values are provided with a warning. See examples reported on the SHAP GitHub page: "Shap values don't match real predictions - DeepExplainer."
SHAP compatibility matrix¶
See the following for blueprint support with SHAP:
Blueprint | Regression | Binary classification | OTV | Prediction servers |
---|---|---|---|---|
Linear models | ✔ | ✔ | ✔ | ✔ |
XGBoost | ✔ | ✔ | ✔ | ✔ |
LightGBM | ✔ | ✔ | ✔ | ✔ |
Keras | ✔ | ✔ | ✔ | ✔ |
Random Forest | x | x | x | x |
Shallow Random Forest | ✔ | ✔ | ✔ | ✔ |
Frequency Cost / Severity | ✔ | N/A | N/A | ✔ |
Stacked / boosted blueprints | ✔ | ✔ | ✔ | ✔ |
Blueprints with calibration | ✔ | ✔ | ✔ | ✔ |
Blenders | x | x | x | x |
sklearn GBM | x | ✔ | ✔ | x |
DataRobot Scoring Code models | x | x | x | x |
Which explainer is used for which model?¶
Within a blueprint that supports SHAP, each modeling vertex uses the SHAP explainer that is most appropriate to the model type:
- Tree-based models (XGBoost, LightGBM, Random Forest, Decision Tree): TreeExplainer
- Keras deep learning models: DeepExplainer
- Linear models: LinearExplainer
If a blueprint contains more than one modeling task, the SHAP values are combined additively to yield the SHAP values for the overall blueprint.
Additional reading¶
The following public information provides additional information on open-source SHAP: