Ordered categoricals¶
Robot 1
Does DataRobot recognize ordered categoricals, like grade in the infamous lending club data?
This is a question from a customer:
Can you tell your models that
A < B < C
so that it’s more regularized?
I feel like the answer is that to leverage ordering you would use it as numeric feature. Quite likely a boosting model is at the top, so it’s just used as an ordered feature anyway. If you just leave it as is, our models will figure it out.
When using a generalized linear model (GLM), you would want to leverage this information because you need fewer degrees of freedom in your model; however, I'm asking here to see if I missed some points.
Robot 2
We actually do order these variables for XGBoost models. The default is frequency ordering but you can also order lexically.
Robot 1
You mean ordinal encoding or directly in XGBoost?
Robot 2
Yeah the ordinal encoding orders the data.
Robot 1
Robot 2
Just change frequency to lexical
and try it out.
Robot 3
Build your own blueprint and select the cols —explicitly set to freq/lex
.
Robot 2
If you’re using a GLM, you can also manually encode the variables in an ordered way, (outside DR):
Use 3 columns:
A: 0, 0, 1
B: 0, 1, 1
C: 1, 1, 1
Lexical works fine in a lot of cases; just do it for all the variables. You can use an mpick
to choose different encodings for different columns.
Robot 1
Got it! Thanks a lot everyone!