Does DataRobot recognize ordered categoricals, like grade in the infamous lending club data?
This is a question from a customer:
Can you tell your models that
A < B < Cso that it’s more regularized?
I feel like the answer is that to leverage ordering you would use it as numeric feature. Quite likely a boosting model is at the top, so it’s just used as an ordered feature anyway. If you just leave it as is, our models will figure it out.
When using a generalized linear model (GLM), you would want to leverage this information because you need fewer degrees of freedom in your model; however, I'm asking here to see if I missed some points.
We actually do order these variables for XGBoost models. The default is frequency ordering but you can also order lexically.
You mean ordinal encoding or directly in XGBoost?
Yeah the ordinal encoding orders the data.
Just change frequency to
lexical and try it out.
Build your own blueprint and select the cols —explicitly set to
If you’re using a GLM, you can also manually encode the variables in an ordered way, (outside DR):
Use 3 columns:
A: 0, 0, 1 B: 0, 1, 1 C: 1, 1, 1
Lexical works fine in a lot of cases; just do it for all the variables. You can use an
mpick to choose different encodings for different columns.
Got it! Thanks a lot everyone!