Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Ordered categoricals

Robot 1

Does DataRobot recognize ordered categoricals, like grade in the infamous lending club data?

This is a question from a customer:

Can you tell your models that A < B < C so that it’s more regularized?

I feel like the answer is that to leverage ordering you would use it as numeric feature. Quite likely a boosting model is at the top, so it’s just used as an ordered feature anyway. If you just leave it as is, our models will figure it out.

When using a generalized linear model (GLM), you would want to leverage this information because you need fewer degrees of freedom in your model; however, I'm asking here to see if I missed some points.

Robot 2

We actually do order these variables for XGBoost models. The default is frequency ordering but you can also order lexically.

Robot 1

You mean ordinal encoding or directly in XGBoost?

Robot 2

Yeah the ordinal encoding orders the data.

Robot 1

Robot 2

Just change frequency to lexical and try it out.

Robot 3

Build your own blueprint and select the cols —explicitly set to freq/lex.

Robot 2

If you’re using a GLM, you can also manually encode the variables in an ordered way, (outside DR):

Use 3 columns:

A: 0, 0, 1
B: 0, 1, 1
C: 1, 1, 1

Lexical works fine in a lot of cases; just do it for all the variables. You can use an mpick to choose different encodings for different columns.

Robot 1

Got it! Thanks a lot everyone!


Updated June 26, 2023
Back to top