DataRobot APIのリソース > APIリファレンスドキュメント > DataRobotブループリントワークショップ > ブループリントワークショップの基本ステップノートブック

ブループリントワークショップの基本ステップノートブック¶

In [1]:

Copied!

import datarobot as dr
import datarobot as dr

In [2]:

Copied!

from datarobot_bp_workshop import Workshop, Visualize
from datarobot_bp_workshop import Workshop, Visualize

In [3]:

Copied!

with open('../api.token', 'r') as f:
    token = f.read()
    dr.Client(token=token, endpoint='https://app.datarobot.com/api/v2')
with open('../api.token', 'r') as f:
    token = f.read()
    dr.Client(token=token, endpoint='https://app.datarobot.com/api/v2')

ワークショップの初期化¶

In [4]:

Copied!

w = Workshop()
w = Workshop()

ブループリントの構築¶

In [5]:

Copied!

w.Task('PNI2')
w.Task('PNI2')

Out[5]:

Missing Values Imputed (quick median) (PNI2)

Input Summary: (None)
Output Method: TaskOutputMethod.TRANSFORM

In [6]:

Copied!

w.Tasks.PNI2()
w.Tasks.PNI2()

Out[6]:

Missing Values Imputed (quick median) (PNI2)

Input Summary: (None)
Output Method: TaskOutputMethod.TRANSFORM

In [7]:

Copied!





pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras, name='A blueprint I made with the Python API').save()
pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras, name='A blueprint I made with the Python API').save()

In [8]:

Copied!

user_blueprint_id = keras_blueprint.user_blueprint_id
user_blueprint_id = keras_blueprint.user_blueprint_id

ブループリントの視覚化¶

In [9]:

Copied!

keras_blueprint.show()
keras_blueprint.show()

No description has been provided for this image

タスクの点検¶

In [10]:

Copied!

pni
pni

Out[10]:

Missing Values Imputed (quick median) (PNI2)

Input Summary: Numeric Data
Output Method: TaskOutputMethod.TRANSFORM

In [11]:

Copied!

rdt
rdt

Out[11]:

Smooth Ridit Transform (RDT5)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

In [12]:

Copied!

binning
binning

Out[12]:

Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

In [13]:

Copied!

keras
keras

Out[13]:

Keras Neural Network Classifier (KERASC)

Input Summary: Smooth Ridit Transform (RDT5) | Binning of numerical variables (BINNING)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  learning_rate (learning_rate) = 0.123

In [14]:

Copied!

keras.task_parameters.learning_rate
keras.task_parameters.learning_rate

Out[14]:

0.123

In [15]:

Copied!

keras.task_parameters.batch_size = 32
keras.task_parameters.batch_size = 32

In [16]:

Copied!

keras
keras

Out[16]:

Keras Neural Network Classifier (KERASC)

Input Summary: Smooth Ridit Transform (RDT5) | Binning of numerical variables (BINNING)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  batch_size (batch_size) = 32
  learning_rate (learning_rate) = 0.123

In [17]:

Copied!

keras_blueprint
keras_blueprint

Out[17]:

Name: 'A blueprint I made with the Python API'

Input Data: Numeric
Tasks: Missing Values Imputed (quick median) | Smooth Ridit Transform | Binning of numerical variables | Keras Neural Network Classifier

検証¶

意図的に誤った入力データ型を提供して、検証をテストします。

In [18]:

Copied!





pni = w.Tasks.PNI2(w.TaskInputs.CAT)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
invalid_keras_blueprint = w.BlueprintGraph(keras)
pni = w.Tasks.PNI2(w.TaskInputs.CAT)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
invalid_keras_blueprint = w.BlueprintGraph(keras)

In [19]:

Copied!

invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()
invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()

In [20]:

Copied!

binning.set_task_parameters_by_name(max_bins=-22)
binning.set_task_parameters_by_name(max_bins=-22)

Out[20]:

Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = -22

In [21]:

Copied!

invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()
invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()

Binning of numerical variables (BINNING)

  Invalid value(s) supplied
    max_bins (b) = -22
      - Must be a 'intgrid' parameter defined by: [2, 500]

Failed to save: parameter validation failed.

In [22]:

Copied!

keras.validate_task_parameters()
keras.validate_task_parameters()

Keras Neural Network Classifier (KERASC)

All parameters valid!

Out[22]:

元の有効なブループリントに更新¶

In [23]:

Copied!





pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras)
blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API', user_blueprint_id=user_blueprint_id)
pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras)
blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API', user_blueprint_id=user_blueprint_id)

タスクに関するヘルプを取得¶

In [24]:

Copied!

help(w.Tasks.PNI2)
help(w.Tasks.PNI2)

Help on PNI2 in module datarobot_bp_workshop.factories object:

class PNI2(datarobot_bp_workshop.friendly_repr.FriendlyRepr)
 |  Missing Values Imputed (quick median)
 |  
 |  Impute missing values on numeric variables with their median and create indicator variables to mark imputed records 
 |  
 |  Parameters
 |  ----------
 |  output_method: string, one of (TaskOutputMethod.TRANSFORM).
 |  task_parameters: dict, which may contain:
 |  
 |    scale_small (s): select, (Default=0)
 |      Possible Values: [False, True]
 |  
 |    threshold (t): int, (Default=10)
 |      Possible Values: [1, 99999]
 |  
 |  Method resolution order:
 |      PNI2
 |      datarobot_bp_workshop.friendly_repr.FriendlyRepr
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __call__(zelf, *inputs, output_method=None, task_parameters=None, output_method_parameters=None, x_transformations=None, y_transformations=None, freeze=False, version=None)
 |  
 |  __friendly_repr__(zelf)
 |  
 |  documentation(zelf, auto_open=False)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  description = 'Impute missing values on numeric variables with ...eate...
 |  
 |  label = 'Missing Values Imputed (quick median)'
 |  
 |  task_code = 'PNI2'
 |  
 |  task_parameters = scale_small (s): select, (Default=0)
 |  
 |  threshold (t):...
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from datarobot_bp_workshop.friendly_repr.FriendlyRepr:
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from datarobot_bp_workshop.friendly_repr.FriendlyRepr:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

タスクカテゴリーの一覧表示¶

In [25]:

Copied!

w.list_categories(show_tasks=True)
w.list_categories(show_tasks=True)

Custom

  - Awesome Model (CUSTOMR_6019ae978cc598a46199cee1)
  - "My Custom Task" (CUSTOMR_608e42ac186a7242380a6a98)
  - "My Custom Task" (CUSTOMR_608e42ecd5eb0dc5f28d0dda)
  - "My Custom Task" (CUSTOMR_608e43fc01f9f466aa8d0d81)
  - My Custom Ridge Regressor w/ Imputation (CUSTOMR_608e5a4ed5eb0dc5f28d0ea0)
  - My Custom Ridge Regressor w/ Imputation (CUSTOMR_608e5bc8b66a4934d58d0d4e)
  - My Custom Ridge Regressor w/ Imputation (CUSTOMR_608ef72b6f13f54305667783)
  - My Custom Ridge Regressor w/ Imputation (CUSTOMR_608ef74c5dda651931052422)
  - Second model (CUSTOMC_6019d18adfa83afbad99cdb8)
  - My Imputation Task (CUSTOMT_6188b0e6fb465717f029fd05)
  - Image Featurizer (CUSTOMT_61b452e57fd5b0629a2f4fd3)
  - Maybe Broken? (CUSTOMT_61b7d3f26f8e01a1a8f7bc0c)
Preprocessing

  Numeric Preprocessing

    Data Quality

      - Numeric Data Cleansing (NDC)
    Dimensionality Reducer

      - Truncated Singular Value Decomposition (SVD2)
      - Partial Principal Components Analysis (PPCA)
      - Truncated Singular Value Decomposition (SVD)
    Scaling

      - Impose Uniform Transform (UNIF3)
      - Log Transformer (LOGT)
      - Smooth Ridit Transform (RDT5)
      - Standardize (RST)
      - Search for best transformation including Smooth Ridit (BTRANSF6)
      - Transparent Search for best transformation (BTRANSF6T)
      - Transform on the link function scale (LINK)
      - Ridit Transform (SRDT3)
      - Standardize (ST)
    - Sparse Interaction Machine (SPOLY)
    - Constant Splines (GS)
    - One-Hot Encoding (PDM3)
    - Numeric Data Cleansing (NDC)
    - Missing Values Imputed (quick median) (PNI2)
    - Missing Values Imputed (arbitrary or quick median) (PNIA4)
    - Normalizer (NORM)
    - Search for ratios (RATIO3)
    - Binning of numerical variables (BINNING)
    - Search for differences (DIFF3)
  Categorical Preprocessing

    - Categorical Embedding (CATEMB)
    - Category Count (PCCAT)
    - One-Hot Encoding (PDM3)
    - Ordinal encoding of categorical variables (ORDCAT2)
    - Univariate credibility estimates with L2 (CRED1b1)
    - Buhlmann credibility estimates for high cardinality features (CRED1)
  Text Preprocessing

    - TextBlob Sentiment Featurizer (TEXTBLOB_SENTIMENT)
    - NLTK Sentiment Featurizer (NLTK_SENTIMENT)
    - One-Hot Encoding (PDM3)
    - Pretrained TinyBERT Featurizer (TINYBERTFEA)
    - SpaCy Named Entity Recognition Detector (SPACY_NAMED_ENTITY_RECOGNITION)
    - Fasttext Word Vectorization and Mean text embedding (TXTEM1)
    - Keras encoding of text variables (KERAS_TOKENIZER)
    - Matrix of word-grams occurrences (PTM3)
  Image Preprocessing

    - OpenCV Detect Largest Rectangle (OPENCV_DETECT_LARGEST_RECTANGLE)
    - OpenCV Image Featurizer (OPENCV_FEATURIZER)
    - Grayscale Downscaled Image Featurizer (IMG_GRAYSCALE_DOWNSCALED_IMAGE_FEATURIZER)
    - No Post Processing (IMAGE_POST_PROCESSOR)
    - Pretrained Multi-Level Global Average Pooling Image Featurizer (IMGFEA)
  Summarized Categorical Preprocessing

    - Summarized Categorical to Sparse Matrix (CDICT2SP)
    - Single Column Converter for Summarized Categorical (SCBAGOFCAT2)
  Geospatial Preprocessing

    - Spatial Neighborhood Featurizer (GEO_NEIGHBOR_V1)
    - Geospatial Location Converter (GEO_IN)

Models

  Regression

    - eXtreme Gradient Boosted Trees Quantile Regressor with Early Stopping (ESQUANTXGBR)
    - ExtraTrees Regressor (RFR)
    - Elastic-Net Regressor (L1 / Least-Squares Loss) (ENETCDWC)
    - Light Gradient Boosted Trees Regressor with Early Stopping (ESLGBMTR)
    - eXtreme Gradient Boosted Trees Regressor (PXGBR2)
    - Ridge Regression (RIDGE)
    - Nystroem Kernel SVM Regressor (ASVMER)
    - eXtreme Gradient Boosted Trees Regressor with Early Stopping and Unsupervised Learning Features (UESXGBR2)
    - eXtreme Gradient Boosted Trees Regressor (XGBR2)
    - eXtreme Gradient Boosted Trees Regressor (XL_PXGBR2)
    - Nystroem Kernel SVM Regressor (ASVMSKR)
    - Partial Least-Squares Regression (PLS)
    - Gaussian Process Regressor with Rational Quadratic Kernel (GPRRQ)
    - Eureqa Regressor (EQR)
    - Auto-Tuned Char N-Gram Text Modeler using token counts (CNGER2)
    - Frequency-Severity Generalized Additive Model (FSGG2)
    - Hot Spots (XPRIMR)
    - Linear Regression (GLMCD)
    - Frequency-Severity ElasticNet (FSEE)
    - Gradient Boosted Trees Regressor with Early Stopping (Least-Squares Loss) (ESGBR2)
    - Light Gradient Boosting on ElasticNet Predictions (RES_ESLGBMTR)
    - Support Vector Regressor (Radial Kernel) (SVMR2)
    - Regularized Quantile Regressor with Keras (KERAS_REGULARIZED_QUANTILE_REG)
    - Auto-tuned K-Nearest Neighbors Regressor (Euclidean Distance) (KNNR)
    - Lasso Regression (LASSO2)
    - Gaussian Process Regressor with Radial Basis Function Kernel (GPRRBF)
    - XRuleFit Regressor (XRULEFITR)
    - Frequency-Severity Light Gradient Boosted Trees (FSLL)
    - Ridge Regression (RIDGEWC)
    - Stochastic Gradient Descent Regression (SGDR)
    - Eureqa Generalized Additive Model (EQ_ESXGBR)
    - Elastic-Net Regressor (L1 / Least-Squares Loss) with K-Means Distance Features (KMDENETCD)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_XGBR2)
    - Auto-Tuned Word N-Gram Text Modeler using token counts (WNGER2)
    - Auto-Tuned Summarized Categorical Modeler (SCENETR)
    - Keras Neural Network Regressor (KERASR)
    - Elastic-Net Regressor (L1 / Least-Squares Loss) (ENETCD)
    - eXtreme Gradient Boosted Trees Regressor (XL_XGBR2)
    - Gaussian Process Regressor with Dot Product Kernel (GPRDP)
    - Dropout Additive Regression Trees Regressor (PLGBMDR)
    - Elastic-Net Regressor (L1 / Least-Squares Loss) with Binned numeric features (BENETCD2)
    - eXtreme Gradient Boosted Trees Regressor with Early Stopping (XL_ESXGBR2)
    - Auto-tuned Stochastic Gradient Descent Regression (SGDRA)
    - RuleFit Regressor (RULEFITR)
    - Gaussian Process Regressor with Exponential Sine Squared Kernel (GPRESS)
    - Adaboost Regressor (ABR)
    - Elastic-Net Regressor (L1 / Least-Squares Loss) with Unsupervised Learning Features (UENETCD)
    - Gaussian Process Regressor with Matern Kernel (GPRM)
    - Light Gradient Boosting on ElasticNet Predictions (RES_PLGBMTR)
    - Gradient Boosted Trees Quantile Regressor with Early Stopping (QESGBR2)
    - ExtraTrees Regressor (Shallow) (SHAPRFR)
    - Statsmodels Quantile Regressor (QUANTILER)
    - eXtreme Gradient Boosted Trees Regressor with Early Stopping (ESXGBR2)
    - LightGBM Random Forest Regressor (PLGBMRFR)
    - Frequency-Cost ElasticNet (FCEE)
    - Frequency-Severity eXtreme Gradient Boosted Trees (FSXX2)
    - Gradient Boosted Trees Quantile Regressor (QGBR2)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_ESXGBR2)
  Binary Classification

    - Stochastic Gradient Descent Classifier (SGDC)
    - LightGBM Random Forest Classifier (PLGBMRFC)
    - Bernoulli Naive Bayes classifier (scikit-learn) (BNBC)
    - Dropout Additive Regression Trees Classifier (PLGBMDC)
    - Auto-Tuned Char N-Gram Text Modeler using token counts (CNGEC2)
    - Gaussian Process Classifier with Matern Kernel (GPCM)
    - Gradient Boosted Trees Classifier with Early Stopping (ESGBC)
    - XRuleFit Classifier (XRULEFITC)
    - Support Vector Classifier (Radial Kernel) (SVMC2)
    - Multinomial Naive Bayes classifier (scikit-learn) (MNBC)
    - Adaboost Classifier (ABC)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_XGBC2)
    - Elastic-Net Classifier (L1 / Binomial Deviance) (LENETCDWC)
    - Gaussian Process Classifier with Radial Basis Function Kernel (GPCRBF)
    - Light Gradient Boosted Trees Classifier with Early Stopping (ESLGBMTC)
    - Eureqa Classifier (EQC)
    - ExtraTrees Classifier (Gini) (SHAPRFC)
    - Logistic Regression (LR)
    - Keras Neural Network Classifier (KERASC)
    - Nystroem Kernel SVM Classifier (ASVMEC)
    - eXtreme Gradient Boosted Trees Classifier with Early Stopping and Unsupervised Learning Features (UESXGBC2)
    - RuleFit Classifier (RULEFITC)
    - Regularized Logistic Regression (L2) (LR1)
    - Nystroem Kernel SVM Classifier (ASVMSKC)
    - Naive Bayes combiner classifier (CNBC)
    - Light Gradient Boosted Trees Classifier with Early Stopping and Unsupervised Learning Features (UESLGBMTC)
    - Light Gradient Boosting on ElasticNet Predictions (RES_PLGBMTC)
    - Elastic-Net Classifier (L1 / Binomial Deviance) with K-Means Distance Features (KMDLENETCD)
    - ExtraTrees Classifier (Gini) (RFC)
    - Hot Spots (XPRIMC)
    - Partial Least-Squares Classification (PLSC)
    - Auto-tuned K-Nearest Neighbors Classifier (Euclidean Distance) (KNNC)
    - Eureqa Generalized Additive Model Classifier (EQ_ESXGBC)
    - eXtreme Gradient Boosted Trees Classifier (XL_XGBC2)
    - Auto-Tuned Summarized Categorical Modeler (SCLENETC)
    - Light Gradient Boosting on ElasticNet Predictions (RES_ESLGBMTC)
    - Elastic-Net Classifier with Naive Bayes Feature Weighting (NB_LENETCD)
    - eXtreme Gradient Boosted Trees Classifier (XGBC2)
    - Elastic-Net Classifier (L1 / Binomial Deviance) (LENETCD)
    - Gaussian Naive Bayes classifier (scikit-learn) (GNBC)
    - Logistic Regression (LRCD)
    - eXtreme Gradient Boosted Trees Classifier with Early Stopping (ESXGBC2)
    - eXtreme Gradient Boosted Trees Classifier (PXGBC2)
    - Auto-Tuned Word N-Gram Text Modeler using token counts (WNGEC2)
  Multi-class Classification

    - Stochastic Gradient Descent Classifier (SGDC)
    - LightGBM Random Forest Classifier (PLGBMRFC)
    - Dropout Additive Regression Trees Classifier (PLGBMDC)
    - Gradient Boosted Trees Classifier with Early Stopping (ESGBC)
    - Light Gradient Boosted Trees Classifier with Early Stopping (ESLGBMTC)
    - ExtraTrees Classifier (Gini) (SHAPRFC)
    - Logistic Regression (LR)
    - Regularized Logistic Regression (L2) (LR1)
    - Light Gradient Boosted Trees Classifier with Early Stopping and Unsupervised Learning Features (UESLGBMTC)
    - Light Gradient Boosting on ElasticNet Predictions (RES_PLGBMTC)
    - Keras Neural Network Classifier (KERASMULTIC)
    - ExtraTrees Classifier (Gini) (RFC)
    - Light Gradient Boosting on ElasticNet Predictions (RES_ESLGBMTC)
    - eXtreme Gradient Boosted Trees Classifier (XGBC2)
    - Elastic-Net Classifier (L1 / Binomial Deviance) (LENETCD)
    - Logistic Regression (LRCD)
    - eXtreme Gradient Boosted Trees Classifier with Early Stopping (ESXGBC2)
    - eXtreme Gradient Boosted Trees Classifier (PXGBC2)
  Boosting

    - eXtreme Gradient Boosted Trees Regressor (XL_PXGBR2)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_XGBC2)
    - Light Gradient Boosting on ElasticNet Predictions (RES_ESLGBMTR)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_XGBR2)
    - eXtreme Gradient Boosted Trees Regressor (XL_XGBR2)
    - Light Gradient Boosting on ElasticNet Predictions (RES_PLGBMTC)
    - eXtreme Gradient Boosted Trees Regressor with Early Stopping (XL_ESXGBR2)
    - eXtreme Gradient Boosted Trees Classifier (XL_XGBC2)
    - Light Gradient Boosting on ElasticNet Predictions (RES_ESLGBMTC)
    - Light Gradient Boosting on ElasticNet Predictions (RES_PLGBMTR)
    - eXtreme Gradient Boosting on ElasticNet Predictions (RES_ESXGBR2)
  Unsupervised

    Anomaly Detection

      - Local Outlier Factor Anomaly Detection (ADLOF)
      - Mahalanobis Distance Ranked Anomaly Detection with PCA and Calibration (ADMAHAL_PCA_CAL)
      - Keras Autoencoder (KERAS_AUTOENCODER)
      - Isolation Forest Anomaly Detection (ADISOFOR)
      - Mahalanobis Distance Ranked Anomaly Detection with PCA (ADMahalPCA)
      - Keras Autoencoder with Calibration (KERAS_AUTOENCODER_CAL)
      - Isolation Forest Anomaly Detection with Calibration (ADISOFOR_CAL)
      - Double Median Absolute Deviation Anomaly Detection (ADDMAD)
      - Keras Variational Autoencoder (KERAS_VARIATIONAL_AUTOENCODER)
      - Keras Variational Autoencoder with Calibration (KERAS_VARIATIONAL_AUTOENCODER_CAL)
      - One-Class SVM Anomaly Detection with Calibration (ADOSVM_CAL)
      - Local Outlier Factor Anomaly Detection with Calibration (ADLOF_CAL)
      - Anomaly Detection with Supervised Learning (XGB) (ADXGB)
      - One-Class SVM Anomaly Detection (ADOSVM)
      - Anomaly Detection with Supervised Learning (XGB) and Calibration (ADXGB2_CAL)
      - Double Median Absolute Deviation Anomaly Detection with Calibration (ADDMAD_CAL)
    Clustering

      - K-Means Clustering (KMEANS)
  

Calibration

  - Calibrate predictions with RF (CALIB_V2_RFC)
  - Text fit on Residuals (L1 /  Least-Squares Loss) (XL_ENETCD)
  - Calibrate predictions: Weighted Calibration (SWCAL)
  - Calibrate predictions (CALIB)
  - Text fit on Residuals (L1 / Binomial Deviance) (XL_LENETCD)
  - Fit High Cardinality and Text (XLF_LENETCD)
  - Text fit on Residuals (L1 /  Least-Squares Loss) (RES_FDENETCD)
  - Calibrate predictions (CALIB2)
  - Calibrate predictions: Platt (PLACAL2)
  - Fit High Cardinality and Text (XLF_ENETCD)
Other

  Column Selection

    - Converter for Text Mining (SCTXT2)
    - Single Column Converter for Summarized Categorical (SCBAGOFCAT)
    - Single Column Converter (SCPICK2)
    - Single Column Converter (SCPICK)
    - Converter for Text Mining (SCTXT4)
    - Multiple Column Selector (MCPICK)
  Automatic Feature Selection

    - Feature Selection for Ratios/Differences (FS_RFR2)
    - Feature Selection for dimensionality reduction (FS_RFCDR2)
    - Feature Selection for dimensionality reduction (FS_RFCDR_LASSO)
    - Feature Selection for dimensionality reduction (FS_RFRDR_LASSO)
    - Feature Selection using L1 Regularization (FS_XL_LASSO2)
    - Rare Feature Masking (RFMASK)
    - Feature Selection for Ratios/Differences (FS_RFC2)
    - Feature Selection for dimensionality reduction (FS_RFRDR2)
  - Bind branches (BIND)

Out[25]:

名前でタスクを検索¶

In [26]:

Copied!

w.search_tasks('keras')
w.search_tasks('keras')

Out[26]:

Keras Autoencoder with Calibration: [KERAS_AUTOENCODER_CAL] 
  - Keras Autoencoder for Anomaly Detection with Calibration


Keras Autoencoder: [KERAS_AUTOENCODER] 
  - Keras Autoencoder for Anomaly Detection


Keras Neural Network Classifier: [KERASC] 
  - Keras Neural Network Classifier


Keras Neural Network Classifier: [KERASMULTIC] 
  - Keras Neural Network Multi-Class Classifier


Keras Neural Network Regressor: [KERASR] 
  - Keras Neural Network Regressor


Keras Variational Autoencoder with Calibration: [KERAS_VARIATIONAL_AUTOENCODER_CAL] 
  - Keras Variational Autoencoder for Anomaly Detection with Calibration


Keras Variational Autoencoder: [KERAS_VARIATIONAL_AUTOENCODER] 
  - Keras Variational Autoencoder for Anomaly Detection


Keras encoding of text variables: [KERAS_TOKENIZER] 
  - Text encoding based on Keras Tokenizer class


Regularized Quantile Regressor with Keras: [KERAS_REGULARIZED_QUANTILE_REG] 
  - Regularized Quantile Regression implemented in Keras

カスタムタスクの検索¶

In [27]:

Copied!

w.search_tasks('Awesome')
w.search_tasks('Awesome')

Out[27]:

Awesome Model: [CUSTOMR_6019ae978cc598a46199cee1] 
  - This is the best model ever.

柔軟な検索¶

In [28]:

Copied!

w.search_tasks('bins')
w.search_tasks('bins')

Out[28]:

Binning of numerical variables: [BINNING] 
  - Bin numerical values into non-uniform bins using decision trees


Elastic-Net Regressor (L1 / Least-Squares Loss) with Binned numeric features: [BENETCD2] 
  - Bin numerical values into non-uniform bins using decision trees, followed by Elasticnet model using block coordinate descent-- a common form of derivated-free optimization. Based on lightning CDRegressor.

In [29]:

Copied!

w.search_tasks('Pre-proc')
w.search_tasks('Pre-proc')

Out[29]:

In [30]:

Copied!

[a.task_code for a in w.search_tasks('decision')]
[a.task_code for a in w.search_tasks('decision')]

Out[30]:

['BINNING', 'BENETCD2', 'RFC', 'RFR']

In [31]:

Copied!

w.Tasks.RFC
w.Tasks.RFC

Out[31]:

ExtraTrees Classifier (Gini): [RFC] 
  - Random Forests based on scikit-learn. Random forests are an ensemble method where hundreds (or thousands) of individual decision trees are fit to bootstrap re-samples of the original dataset.  ExtraTrees are a variant of RandomForests with even more randomness.

簡単な説明¶

In [32]:

Copied!

w.Tasks.PDM3.description
w.Tasks.PDM3.description

Out[32]:

'One-Hot (or dummy-variable) transformation of categorical features'

タスクのドキュメントを表示¶

In [33]:

Copied!

binning.documentation()
binning.documentation()

Out[33]:

'https://app.datarobot.com/model-docs/tasks/BINNING-Binning-of-numerical-variables.html'

タスクパラメーター値の表示¶

例として、ビニングタスクを見てみましょう。

In [34]:

Copied!

binning.get_task_parameter_by_name('max_bins')
binning.get_task_parameter_by_name('max_bins')

Out[34]:

タスクパラメーターの変更¶

In [35]:

Copied!

binning.set_task_parameters_by_name(max_bins=22)
binning.set_task_parameters_by_name(max_bins=22)

Out[35]:

Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = 22

キーを使用したタスクパラメーターの設定¶

あるいは、ショートネームを直接使用することもできます。

In [36]:

Copied!

binning.task_parameters.b = 22
binning.task_parameters.b = 22

パラメーターの検証¶

In [37]:

Copied!

binning.task_parameters.b = -22
binning.task_parameters.b = -22

In [38]:

Copied!

binning.validate_task_parameters()
binning.validate_task_parameters()

Binning of numerical variables (BINNING)

  Invalid value(s) supplied
    max_bins (b) = -22
      - Must be a 'intgrid' parameter defined by: [2, 500]

Out[38]:

In [39]:

Copied!

binning.set_task_parameters(b=22)
binning.set_task_parameters(b=22)

Out[39]:

Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = 22

タスクパラメーターの検証¶

In [40]:

Copied!

binning.validate_task_parameters()
binning.validate_task_parameters()

Binning of numerical variables (BINNING)

All parameters valid!

Out[40]:

user_blueprint_idを渡して、個人リポジトリ内の既存のブループリントを更新します。

In [41]:

Copied!

blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API (updated)', user_blueprint_id=user_blueprint_id)
blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API (updated)', user_blueprint_id=user_blueprint_id)

In [42]:

Copied!

assert user_blueprint_id == blueprint_graph.user_blueprint_id
assert user_blueprint_id == blueprint_graph.user_blueprint_id

ブループリントの取得¶

保存したブループリントからブループリントを取得できます。

In [43]:

Copied!

w.get(user_blueprint_id).show()
w.get(user_blueprint_id).show()

個人用ブループリントリポジトリからブループリントを取得¶

In [44]:

Copied!

for bp in w.list(limit=3):
    bp.show()
for bp in w.list(limit=3):
    bp.show()

個人リポジトリからブループリントを削除¶

In [45]:

Copied!

w.delete(user_blueprint_id)
w.delete(user_blueprint_id)

Blueprints deleted.

ワークショップからリーダーボードのブループリントを取得¶

In [46]:

Copied!

project_id = '5eb9656901f6bb026828f14e'
project = dr.Project.get(project_id)
menu = project.get_blueprints()
project_id = '5eb9656901f6bb026828f14e'
project = dr.Project.get(project_id)
menu = project.get_blueprints()

In [47]:

Copied!

for bp in menu[6:9]:
    Visualize.show_dr_blueprint(bp)
for bp in menu[6:9]:
    Visualize.show_dr_blueprint(bp)

リーダーボードからブループリントのクローンを作成¶

In [48]:

Copied!

ridge = menu[7]
blueprint_graph = w.clone(blueprint_id=ridge.id, project_id=project_id)
blueprint_graph.show()
ridge = menu[7]
blueprint_graph = w.clone(blueprint_id=ridge.id, project_id=project_id)
blueprint_graph.show()

In [49]:

Copied!

ridge.id, project_id
ridge.id, project_id

Out[49]:

('1774086bd8bfd4e1f45c5ff503a99ee2', '5eb9656901f6bb026828f14e')

どのブループリントもチュートリアルとして使用可能¶

In [50]:

Copied!

source_code = blueprint_graph.to_source_code(to_stdout=True)
source_code = blueprint_graph.to_source_code(to_stdout=True)

w = Workshop(user_blueprint_id='61d4dda0addc0e8a29404b9b')

rst = w.Tasks.RST(w.TaskInputs.DATE)

pdm3 = w.Tasks.PDM3(w.TaskInputs.CAT)
pdm3.set_task_parameters(cm=500, sc=25)

gs = w.Tasks.GS(w.TaskInputs.NUM)

enetcd = w.Tasks.ENETCD(rst, pdm3, gs)
enetcd.set_task_parameters(a=0)

enetcd_blueprint = w.BlueprintGraph(enetcd, name='Ridge Regressor')

ブループリントの実行¶

In [51]:

Copied!

eval(compile(source_code, 'blueprint', 'exec'))
eval(compile(source_code, 'blueprint', 'exec'))

In [52]:

Copied!

enetcd_blueprint.show()
enetcd_blueprint.show()

元のブループリントを直接削除¶

In [53]:

Copied!

blueprint_graph.delete()
blueprint_graph.delete()

Blueprint deleted.

ソースコードの変更¶

In [54]:

Copied!

#w = Workshop()

rst = w.Tasks.RST(w.TaskInputs.DATE)

# Use numeric data cleansing instead
ndc = w.Tasks.NDC(w.TaskInputs.NUM)

pdm3 = w.Tasks.PDM3(w.TaskInputs.CAT)
pdm3.set_task_parameters(cm=500, sc=25)

enetcd = w.Tasks.ENETCD(rst, ndc, pdm3)
enetcd.set_task_parameters(a=0.0)

enetcd_blueprint = w.BlueprintGraph(enetcd, name='Ridge Regressor')
#w = Workshop()

rst = w.Tasks.RST(w.TaskInputs.DATE)

# Use numeric data cleansing instead
ndc = w.Tasks.NDC(w.TaskInputs.NUM)

pdm3 = w.Tasks.PDM3(w.TaskInputs.CAT)
pdm3.set_task_parameters(cm=500, sc=25)

enetcd = w.Tasks.ENETCD(rst, ndc, pdm3)
enetcd.set_task_parameters(a=0.0)

enetcd_blueprint = w.BlueprintGraph(enetcd, name='Ridge Regressor')

In [55]:

Copied!

enetcd_blueprint.show()
enetcd_blueprint.show()

プロジェクトにブループリントを追加してトレーニングする¶

In [56]:

Copied!

project_id = '5eb9656901f6bb026828f14e'
project_id = '5eb9656901f6bb026828f14e'

In [57]:

Copied!

enetcd_blueprint.save()
enetcd_blueprint.save()

Out[57]:

Name: 'Ridge Regressor'

Input Data: Date | Categorical | Numeric
Tasks: Standardize | One-Hot Encoding | Numeric Data Cleansing | Elastic-Net Regressor (L1 / Least-Squares Loss)

In [58]:

Copied!

enetcd_blueprint.train(project_id=project_id)
enetcd_blueprint.train(project_id=project_id)

Training requested! Blueprint Id: fa329535f1e5f5465e2c55024aacb910

Out[58]:

Name: 'Ridge Regressor'

Input Data: Date | Categorical | Numeric
Tasks: Standardize | One-Hot Encoding | Numeric Data Cleansing | Elastic-Net Regressor (L1 / Least-Squares Loss)

Custom Models¶

タスクの検索¶

In [59]:

Copied!

w.search_tasks('awesome model')
w.search_tasks('awesome model')

Out[59]:

Awesome Model: [CUSTOMR_6019ae978cc598a46199cee1] 
  - This is the best model ever.

In [60]:

Copied!

w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1
w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1

Out[60]:

Awesome Model: [CUSTOMR_6019ae978cc598a46199cee1] 
  - This is the best model ever.

In [61]:

Copied!

w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1(w.TaskInputs.NUM)
w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1(w.TaskInputs.NUM)

Out[61]:

Awesome Model (CUSTOMR_6019ae978cc598a46199cee1)

Input Summary: Numeric Data
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  version_id (version_id) = latest_6019ae978cc598a46199cee1

In [62]:

Copied!

w.CustomTask('CUSTOMR_6019ae978cc598a46199cee1')
w.CustomTask('CUSTOMR_6019ae978cc598a46199cee1')

Out[62]:

Awesome Model (CUSTOMR_6019ae978cc598a46199cee1)

Input Summary: (None)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  version_id (version_id) = latest_6019ae978cc598a46199cee1

In [63]:

Copied!

w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions
w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions

Out[63]:

Latest (latest_6019ae978cc598a46199cee1): str

v3.0 (6019e2418311cc8207a5f8e1): str

v2.10 (6019dff0509159ede309f9c9): str

v2.9 (6019dc3b8311cc8207a5f7d9): str

v2.8 (6019dbcb4f6322a6283883d9): str

v2.7 (6019db4d041c71bd7ea1c670): str

v2.6 (6019da5d4f6322a628388364): str

v2.5 (6019d924be257008648e3c62): str

v2.4 (6019d7db3d7d080b078e3c39): str

v2.3 (6019d744356f3c430b38828d): str

v2.2 (6019d305be257008648e3c0c): str

v2.1 (6019d2e045e619fc03a2eead): str

v2.0 (6019d2bd3d7d080b078e3b66): str

v1.3 (6019cf0735270cbe238e3c76): str

v1.2 (6019b9fdbf5b0a42aba1c6e9): str

v1.1 (6019b81729ae9ab5ad8e3c26): str

v1.0 (6019afe4dcd97e1e5ebfee13): str

ブループリントの構築¶

In [64]:

Copied!





pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
customr = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1(rdt, binning)
custom_bp = w.BlueprintGraph(customr, name='My Fun Custom Blueprint').save()
pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
customr = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1(rdt, binning)
custom_bp = w.BlueprintGraph(customr, name='My Fun Custom Blueprint').save()

タスクバージョンの更新¶

In [65]:

Copied!

customr.version = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions.v2_7
customr.version = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions.v2_7

In [66]:

Copied!

customr
customr

Out[66]:

Awesome Model (CUSTOMR_6019ae978cc598a46199cee1)

Input Summary: Smooth Ridit Transform (RDT5) | Binning of numerical variables (BINNING)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  version_id (version_id) = 6019db4d041c71bd7ea1c670

In [67]:

Copied!

customr.version = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions.Latest
customr.version = w.CustomTasks.CUSTOMR_6019ae978cc598a46199cee1.versions.Latest

In [68]:

Copied!

custom_bp.save()
custom_bp.save()

Out[68]:

Name: 'My Fun Custom Blueprint'

Input Data: Numeric
Tasks: Missing Values Imputed (quick median) | Smooth Ridit Transform | Binning of numerical variables | Awesome Model

検索、表示、およびトレーニング¶

In [69]:

Copied!

bps = w.list(limit=3)
bps = w.list(limit=3)

In [70]:

Copied!

list(bps)[0].show()
list(bps)[0].show()

In [71]:

Copied!

custom_bp.train(project_id=project_id)
custom_bp.train(project_id=project_id)

Training requested! Blueprint Id: 3d753707758ad45b97684811a8756c20

Out[71]:

Name: 'My Fun Custom Blueprint'

Input Data: Numeric
Tasks: Missing Values Imputed (quick median) | Smooth Ridit Transform | Binning of numerical variables | Awesome Model

In [72]:

Copied!

custom_bp.delete()
custom_bp.delete()

Blueprint deleted.

更新しました 2025年4月2日

このページは役に立ちましたか？

ありがとうございます。どのような点が役に立ちましたか？

より良いコンテンツを提供するには、どうすればよいでしょうか？

アンケートにご協力いただき、ありがとうございました。