Modeling > Modeling reference > Modeling details > Modeling algorithms

Modeling algorithms¶

DataRobot supports a comprehensive library of pre- and post-processing (modeling) steps, which combine to make up the model blueprint. Which are run or available in the model repository is dependent on the dataset. The comprehensive combination of pre- and post-processing steps allows DataRobot to confidently create a Leaderboard of your best modeling options. Some examples of the modeling flexibility include logistic regression with and without PCA as a pre-processor or random forests with and without a greedy search for interaction terms.

The implication of this is that for every model in the list below, DataRobot likely runs two-to-five times, each with a different pre-processing and/or variable selection. The following sections list the relevant algorithms:

Pre-processing
Linear or additive models
Tree-based models
Deep learning and foundational models
Time series-specific models
Unsupervised models
Other model types

Pre-processing tasks¶

Categorical¶

Buhlman credibility estimates for high cardinality features
Categorical embedding
Category count
One-hot encoding
Ordinal encoding of categorical variables
Univariate credibility estimates with L2
Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

Numerical¶

Binning of numerical variables
Constant splines
Missing values imputed
Numeric data cleansing
Partial Principal Components Analysis
Truncated Singular Values Decomposition
Normalizer

Geospatial¶

Geospatial Location Converter
Spatial Neighborhood Featurizer

Images¶

Greyscale Downscaled Image Featurizer
No Post Processing
OpenCV detect largest rectangle
OpenCV image featurizer
Pre-trained multi-level global average pooling image featurizer

Text models¶

Character / word n-grams
Pretrained byte-pair encoders (best of both words for char-grams and n-grams)
Stopword removal
TF-IDF scaling (optional sublinear scaling and binormal separation scaling)
Hashing vectorizers for big data
Cosine similarity between pairs of text columns (on datasets with 2+ text columns)
Support for all languages, including English, Japanese, Chinese, Korean, French, Spanish, Chinese, Portuguese, Arabic, Ukrainian, Klingon, Elvish, Esperanto, etc.
Unsupervised Fasttext models
Linear n-gram models (character/word n-grams + TF-IDF + penalized linear/logistic regression)
SVD n-gram models (n-grams + TF-IDF + SVD)
Naive Bayes weighted SVM
TinyBERT / Roberta/ MiniLM embedding models
Text CNNs

Generalized Linear Models¶

NA imputation (methods for missing at random and missing not at random), standardization, ridit transform
Search for best transformations
Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

Linear or additive models¶

Generalized Linear Models¶

Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None (Logistic Regression)
Distributions: Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber
Special Cases: 2-stage model (Binomial + Gaussian) for zero-inflated regression

Support Vector Machines¶

Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None
Kernel: Linear, Nyström RFB, RBF
liblinear and libsvm

Generalized Additive Models¶

GAM
GA2M

Tree-based models¶

Decision Tree (or CART)
Random Forest
ExtraTrees (or Extremely Randomized Forests)
Gradient Boosted Trees (or GBM— Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber)
Extreme Gradient Boosted Trees (or XGBoost— Binomial, Gaussian, Poisson)
LightGBM
AdaBoost
RuleFit

Deep learning and foundational models¶

Keras MLPs with residual connections, adaptive learning rates and adaptive batch sizes
Keras self-normalizing MLPs with residual connections
Keras neural architecture search MLPs using hyperband
DeepCTR
- Neural Factorization Machines
- AutoInt
- Cross Networks
Pretrained CNNs for images using foundational models (especially EfficientNet)
- Manually pruned and optimized for faster inference
Pretrained + fine-tuned CNNs for images
Image augmentation
Pretrained TinyBERT models for text
Keras Text CNNs
Fastext models for text

Time series-specific models¶

LSTMs
DeepAR models
AutoArima
ETS, aka exponential smoothing
TBATS
Prophet

Unsupervised models¶

Anomaly detection models¶

Isolation Forest
Local Outlier Factor
One Class SVM
Double Median Absolute Deviation
Mahalanobis Distance
Anomaly Detection Blenders
Keras Deep Autoencoder
Keras Deep Variational Autoencoder

Clustering models¶

Kmeans
HDBScan

Other model types¶

Eureqa (proprietary genetic algorithm for symbolic regression)
K-Nearest Neighbors (three distances)
Partial-least squares (used for blenders)
Isotonic Regression (used for calibrating predictions from other models)

Click a blueprint node to access full model documentation. Using Composable ML, you can build models that best suit your needs using built-in tasks and custom Python/R code.

Modeling algorithms¶

Pre-processing tasks¶

Categorical¶

Numerical¶

Geospatial¶

Images¶

Text models¶

Generalized Linear Models¶

Linear or additive models¶

Generalized Linear Models¶

Support Vector Machines¶

Generalized Additive Models¶

Tree-based models¶

Deep learning and foundational models¶

Time series-specific models¶

Unsupervised models¶

Anomaly detection models¶

Clustering models¶

Other model types¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?