モデリング > モデリングリファレンス > モデリングの詳細 > モデリングアルゴリズム

モデリングアルゴリズム¶

DataRobotは、前および後処理（モデリング）ステップの包括的なライブラリをサポートしており、これらを組み合わせてモデルのブループリントを構成します。モデルリポジトリで実行されるもの、利用できるものは、データセットによって異なります。前処理と後処理のステップを包括的に組み合わせることで、DataRobotは、最適なモデリングオプションのリーダーボードを自信を持って作成できます。モデリングの柔軟性の例としては、プリプロセッサーとしてPCAを使用する、あるいは使用しないロジスティック回帰や、交互作用項の貪欲探索を行う、あるいは行わないランダムフォレストなどがあります。

つまり、以下のリストのすべてのモデルについて、DataRobotが2〜5回実行され、それぞれ異なる前処理や変数の選択が行われます。関連するアルゴリズムを以下のセクションに一覧表示します。

前処理
線形または加法モデル
木型モデル
ディープラーニングおよび基本モデル
時系列固有のモデル
教師なしモデル
その他のモデルタイプ

前処理タスク¶

カテゴリー¶

Buhlman credibility estimates for high cardinality features
Categorical embedding
Category count
One-hot encoding
Ordinal encoding of categorical variables
Univariate credibility estimates with L2
Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

数値¶

Binning of numerical variables
Constant splines
Missing values imputed
Numeric data cleansing
Partial Principal Components Analysis
Truncated Singular Values Decomposition
Normalizer

地理空間¶

Geospatial Location Converter
Spatial Neighborhood Featurizer

画像¶

Greyscale Downscaled Image Featurizer
No Post Processing
OpenCV detect largest rectangle
OpenCV image featurizer
Pre-trained multi-level global average pooling image featurizer

テキストモデル¶

Character / word n-grams
Pretrained byte-pair encoders (best of both words for char-grams and n-grams)
Stopword removal
TF-IDF scaling (optional sublinear scaling and binormal separation scaling)
Hashing vectorizers for big data
Cosine similarity between pairs of text columns (on datasets with 2+ text columns)
Support for all languages, including English, Japanese, Chinese, Korean, French, Spanish, Chinese, Portuguese, Arabic, Ukrainian, Klingon, Elvish, Esperanto, etc.
Unsupervised Fasttext models
Linear n-gram models (character/word n-grams + TF-IDF + penalized linear/logistic regression)
SVD n-gram models (n-grams + TF-IDF + SVD)
Naive Bayes weighted SVM
TinyBERT / Roberta / MiniLM embedding models
Text CNNs

一般化線形モデル¶

NA imputation (methods for missing at random and missing not at random), standardization, ridit transform
Search for best transformations
Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

線形または加法モデル¶

一般化線形モデル¶

Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None (Logistic Regression)
Distributions: Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber
Special Cases: 2-stage model (Binomial + Gaussian) for zero-inflated regression

Support Vector Machines¶

Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None
Kernel: Linear, Nyström RFB, RBF
liblinear and libsvm

一般化加法モデル¶

GAM
GA2M

木型モデル¶

Decision Tree (or CART)
Random Forest
ExtraTrees (or Extremely Randomized Forests)
Gradient Boosted Trees (or GBM— Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber)
Extreme Gradient Boosted Trees (or XGBoost— Binomial, Gaussian, Poisson)
LightGBM
AdaBoost
RuleFit

ディープラーニングおよび基本モデル¶

Keras MLPs with residual connections, adaptive learning rates and adaptive batch sizes
Keras self-normalizing MLPs with residual connections
Keras neural architecture search MLPs using hyperband
DeepCTR
- Neural Factorization Machines
- AutoInt
- Cross Networks
Pretrained CNNs for images using foundational models (especially EfficientNet)
- Manually pruned and optimized for faster inference
Pretrained + fine-tuned CNNs for images
Image augmentation
Pretrained TinyBERT models for text
Keras Text CNNs
Fastext models for text

時系列固有のモデル¶

LSTMs
DeepAR models
AutoArima
ETS, aka exponential smoothing
TBATS
Prophet

教師なしモデル¶

異常検知モデル¶

Isolation Forest
Local Outlier Factor
One Class SVM
Double Median Absolute Deviation
Mahalanobis Distance
Anomaly Detection Blenders
Keras Deep Autoencoder
Keras Deep Variational Autoencoder

クラスタリングモデル¶

Kmeans
HDBSca

その他のモデルタイプ¶

Eureqa（シンボリック回帰のための独自の遺伝的アルゴリズム）
K-Nearest Neighbors（3つの距離）
Partial-least squares（アンサンブルに使用）
Isotonic Regression（他のモデルの予測の調整に使用）

ブループリントノードをクリックして、モデルドキュメントへのアクセスなどの追加情報を表示します。 Composable MLでは、ビルトインタスクとカスタムPython/Rコードを使用して、ニーズに最適なブループリントを構築します。

更新しました 2025年3月19日

このページは役に立ちましたか？

ありがとうございます。どのような点が役に立ちましたか？

より良いコンテンツを提供するには、どうすればよいでしょうか？

アンケートにご協力いただき、ありがとうございました。