Optimal binning in DataRobot¶
Got a question from a client—does DataRobot help do optimal binning (on numeric features)? I know that in our GAM model, DataRobot will bin each of the numeric features based on its partial dependence from an XGBoost model. Anything else that we do that I am not aware of?
We use a decision tree to find bins.
It's pretty optimal.
It may not be perfectly optimal, but it does a good job finding the right bins.
Single decision tree on single feature—produces leaves with at least minimum number of target values. So bins are variable size and are designed to have enough target statistics per leaf. The boundaries are just sorted splits. An XGBoost model is used to smooth the target and decision tree operate on XGB predictions.