Create and deploy a custom model¶
This notebook outlines how to create, deploy, and monitor a custom inference model with DataRobot's Python client. You can use DataRobot’s Custom Model Workshop to upload a model artifact to create, test, and deploy custom inference models to DataRobot’s centralized deployment hub.
!git clone https://github.com/datarobot/datarobot-user-models.git
cd datarobot-user-models
pip install -r public_dropin_environments/python3_sklearn/requirements.txt
pip install datarobot-drum
pip install datarobot
Connect to DataRobot¶
Read more about different options for connecting to DataRobot from the client.
# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call
# dr.Client(config_path='path-to-drconfig.yaml')
Train a custom XGBoost model¶
Provide modeling training data to train a custom XGBoost model with the scikit-learn pipeline in a binary classification use case.
Install the required Python modules (PyYAML==5.3.1
and xgboost==1.2.1
) using the requirements file.
!pip install -r ./custom_model_xgboost/requirements.txt -q
import pandas as pd
import joblib
import numpy as np
import json
from xgboost import XGBClassifier
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
Use the following cells to build a Scikit-Learn Regression model (XGBoost) to predict if a loan will default or not.
train = pd.read_csv('./data/custom_training_10K.csv')
X = train.drop('is_bad', axis=1)
y = train.pop('is_bad')
train.head(5)
Define and fit the preprocessing step per type of feature column.
# Preprocessing for numerical features
numeric_features = list(X.select_dtypes('int64').columns)
for c in numeric_features:
X[c] = X[c].fillna(0)
numeric_transformer = Pipeline(steps=[
('scaler', StandardScaler())])
# Preprocessing for categorical features
categorical_features = list(X.select_dtypes('object').columns)
for c in categorical_features:
X[c] = X[c].fillna('missing')
categorical_transformer = Pipeline(steps=[
('OneHotEncoder', OneHotEncoder(handle_unknown='ignore'))])
# Preprocessor with all of the steps
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
# Full preprocessing pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])
# Train the model-Pipeline
pipeline.fit(X,y)
# Preprocess x
preprocessed = pipeline.transform(X)
# You could also train the model with the sparse matrix
# Transform it to pandas because the hook function in custom.py expected a pandas dataframe to be used for scoring
preprocessed = pd.DataFrame.sparse.from_spmatrix(preprocessed)
5e6696ff820e737a5bd78430
Finally, train an XGBoost Classifier and save both the custom model and preprocessing pipeline in pickle files. Then use these pickle files to upload them in DataRobot.
model = XGBClassifier(colsample_bylevel=0.2,
max_depth= 10,
learning_rate = 0.02,
n_estimators=300,
eval_metric = 'logloss'
)
model.fit(preprocessed, y)
joblib.dump(pipeline,'custom_model_xgboost/preprocessing.pkl')
joblib.dump(model, 'custom_model_xgboost/model.pkl')
Test custom models locally¶
Before uploading custom models to DataRobot, DataRobot recommends testing models locally using DRUM (the DataRobot custom model runner tool). DRUM verifies that a custom model can successfully run and make predictions. Note that this testing is only for development purposes, so you will also need to test the custom inference models in the Custom Model Workshop after uploading the custom model to the DataRobot platform.
Now, use DRUM to test how the model performs by computing latency times and memory usage for several different test case sizes. A report is generated after this process is completed. View a sample output below.
!drum perf-test --code-dir ./custom_model_xgboost --input ./data/custom_scoring_10K.csv --target-type binary --positive-class-label '1' --negative-class-label '0'
DRUM can also validate models to detect and address issues before uploading to DataRobot. Locally, DRUM runs the same tests DataRobot runs automatically before deploying models. Specifically, it tests models for the ability to impute null values, setting each feature in the dataset to "missing" and then sending the features to the model. The two code snippets below validate a model and send the results to validation.log
and then copy the outcome.
!drum validation --code-dir ./custom_model_xgboost --input ./data/custom_scoring_10K.csv --target-type binary --positive-class-label '1' --negative-class-label '0' > validation.log
!cat validation.log
Finally, you can use the model to make predictions. To do this, leverage DRUM and its ability to natively handle the Scikit-Learn model. You need to specify where the model resides and what data to score.
!drum score --code-dir ./custom_model_xgboost --input ./data/custom_scoring_10K.csv --target-type binary --positive-class-label '1' --negative-class-label '0' > predictions.csv
pd.read_csv("predictions.csv").head()
Upload custom model artifacts to DataRobot¶
After testing the custom model locally, upload all the artifacts to DataRobot. To upload the custom model artifacts, you must import several additional packages.
import datarobot as dr
from datarobot import Project, Deployment
import datetime as dt
from datetime import datetime
import dateutil.parser
import os
import re
from importlib import reload
Select an environment¶
Select an environment to run the custom inference model. Instead of creating your own environment, select a pre-built environment provided by DataRobot and modify it to add the required packages and dependencies.
# List all existing base environments
execution_environments = dr.ExecutionEnvironment.list()
execution_environments
for execution_environment in execution_environments :
#print(execution_environment)
if execution_environment.name == '[DataRobot] Python 3 Scikit-Learn Drop-In':
break
BASE_ENVIRONMENT = execution_environment
environment_versions = dr.ExecutionEnvironmentVersion.list(execution_environment.id)
BASE_ENVIRONMENT_VERSION = environment_versions[0]
print(BASE_ENVIRONMENT)
print(BASE_ENVIRONMENT_VERSION)
print(BASE_ENVIRONMENT.id)
Create a model package¶
Next, create a custom model package. To do so, complete three tasks:
- Add a new, "empty" custom inference model package.
- Add artifacts to assemble the custom inference model.
- Update and modify the pre-built environment you previously selected.
The code for each of these tasks is outlined in the following cell:
# Create a new custom model
custom_model = dr.CustomInferenceModel.create(
name='Loan Default Custom - 13-05-2022 - API Python',
target_type=dr.TARGET_TYPE.BINARY,
target_name="is_bad",
positive_class_label="1",
negative_class_label="0",
description="XGboost model. Preprocess data using scikit-learn pipeline. Custom.py preprocess and score",
language="Python"
)
# Create a new custom model version in DataRobot
print("Upload new version of model to DataRobot")
model_version = dr.CustomModelVersion.create_clean(
custom_model_id=custom_model.id,
base_environment_id=BASE_ENVIRONMENT.id,
files=['./custom_model_xgboost/custom.py',
'./custom_model_xgboost/model.pkl',
'./custom_model_xgboost/preprocessing.pkl',
'./custom_model_xgboost/requirements.txt'],
)
# Update dependencies
# In this case there is no requirements.txt file, so you can uncomment these code lines
build_info = dr.CustomModelVersionDependencyBuild.start_build(
custom_model_id=custom_model.id,
custom_model_version_id=model_version.id,
max_wait=3600, # 1 hour timeout
)
Test the custom inference model in DataRobot¶
Next, use the environment to run the model with prediction test data to verify that the custom model is functional before deployment. To do this, upload the inference dataset for testing predictions.
df = pd.read_csv('./data/custom_training_10K.csv')
df_inference=pd.read_csv('./data/custom_scoring_10K.csv')
train_dataset = dr.Dataset.create_from_in_memory_data(df, categories = ["TRAINING"])
pred_test_dataset = dr.Dataset.create_from_in_memory_data(df_inference)
custom_model.assign_training_data(train_dataset.id)
After uploading the inference dataset, you can test the custom inference model. View a sample outcome of testing below.
# Test a new version in DataRobot
print("Run test of new version in DataRobot")
custom_model_test = dr.CustomModelTest.create(
custom_model_id=custom_model.id,
custom_model_version_id=model_version.id,
dataset_id=pred_test_dataset.id,
max_wait=3600, # 1 hour timeout
)
custom_model_test.overall_status
# Option 1
HOST = "https://app.datarobot.com"
for name, test in custom_model_test.detailed_status.items():
print('Test: {}'.format(name))
print('Status: {}'.format(test['status']))
print('Message: {}'.format(test['message']))
print("Finished testing: "+HOST+"model-registry/custom-models/"+custom_model.id+"/assemble")