Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Create custom inference models

Custom inference models are user-created, pretrained models that you can upload to DataRobot (as a collection of files) via the Custom Model Workshop. You can then upload a model artifact to create, test, and deploy custom inference models to DataRobot's centralized deployment hub.

You can assemble custom inference models in either of the following ways:

  • Create a custom model and include web server Scoring Code and a start_server.sh file in the model's folder. This type of custom model can be paired with a custom or drop-in environment.

  • Create a custom model without providing web server Scoring Code and a start_server.sh file. This type of custom model must use a drop-in environment. Drop-in environments contain the web server Scoring Code and a start_server.sh file used by the model. They are provided by DataRobot in the Workshop. You can also create your own drop-in custom environment.

Be sure to review the guidelines for preparing a custom model folder before proceeding. If any files overlap between the custom model and the environment folders, the model's files will take priority.

Note

Once a custom model's file contents are assembled, you can test the contents locally for development purposes before uploading it to DataRobot. After you create a custom model in the Workshop, you can run a testing suite from the Assemble tab.

Create a new custom model

  1. To create a custom model, navigate to Model Registry > Custom Model Workshop and select the Models tab. This tab lists the models you have created. Click Add new model.

  2. In the Add Custom Inference Model window, enter the fields described in the table below:

    Element Description
    Model name Name the custom model.
    Target type / Target name Select the target type (binary classification, regression, multiclass, anomaly detection, or unstructured) and enter the name of the target feature.
    Positive class label / Negative class label These fields only display for binary classification models. Specify the value to be used as the positive class label and the value to be used as the negative class label.
  3. Click Show Optional Fields and, if necessary, enter a prediction threshold, the coding language used to build the model, and a description.

  4. After completing the fields, click Add Custom Model.

  5. In the Assemble tab, under Model Environment on the right, select a model environment by clicking the Base Environment dropdown menu on the right and selecting an environment. The model environment is used for testing and deploying the custom model.

    Note

    The Base Environment pulldown menu includes drop-in model environments, if any exist, as well as custom environments that you can create.

  6. Under Model on the left, add content by dragging and dropping files or browsing. Alternatively, select a remote integrated repository.

    If you click Browse local file, you have the option of adding a Local Folder. The local folder is for dependent files and additional assets required by your model, not the model itself. Even if the model file is included in the folder, it will not be accessible to DataRobot unless the file exists at the root level. The root file can then point to the dependencies in the folder.

    Note

    You must also upload web server Scoring Code and a start_server.sh file to your model's folder unless you are pairing the model with a drop-in environment.

  7. After adding your model content, you can:

Anomaly detection

You can create custom inference models that support anomaly detection problems. If you choose to build one, reference the DRUM template. (Log in to GitHub before clicking this link.) When deploying custom inference anomaly detection models, note that the following functionality is not supported:

  • Data drift
  • Accuracy and association IDs
  • Challenger models
  • Humility rules
  • Prediction intervals

Manage dependencies

Custom models can contain various machine learning libraries in the model code, but not every drop-in environment provided by DataRobot natively supports all libraries. However, you can manage these dependencies from the Workshop and update the base drop-in environments to support your model code.

To manage model dependencies, you must include a requirements.txt file uploaded as part of your custom model. The text file must indicate the machine learning libraries used in the model code.

For example, consider a custom R model that uses Caret and XGBoost libraries. If this model is added to the Workshop and the R drop-in environment is selected, the base environment will only support Caret, not XGBoost. To address this, edit requirements.txt to include the Caret and XGBoost dependencies. After editing and re-uploading the requirements file, the base environment can be installed with XGBoost, making the model available within the environment.

List the following, depending on the model type, in requirements.txt:

  • For R models, list the machine learning library dependencies.

  • For Python models, list the dependencies and any version constraints for the libraries. Supported constraint types include <, <=, ==, >=, >, and multiple constraints can be issued in a single entry (for example, pandas >= 0.24, < 1.0).

Once the requirements file is updated to include dependencies and constraints, navigate to your custom model's Assemble tab. Upload the file under the Model > Content header. The Model Dependencies field updates to display the dependencies and constraints listed in the file.

From the Assemble tab, select a base drop-in environment under the Model Environment header. DataRobot warns you that a new environment must be built to account for the model dependencies. Select Build environment, and DataRobot installs the required libraries and constraints to the base environment.

Once the base environment is updated, your custom model will be usable with the environment, allowing you to test, deploy, or register it.

Add new versions

If you want to update a model due to new package versions, different preprocessing steps, hyperparameters, and more, you can update the file contents to create a new version of the model and/or environment.

To do so, select the model from the Workshop and navigate to the Assemble tab. Under the Model header, select Add Files. Upload the files or folders that you updated.

When you update the individual contents of a model, the minor version (1.1, 1.2, etc.) of the model automatically updates.

You can create a new major version of a model (1.0, 2.0, etc.) by selecting New Version. Choose to copy the contents of a previous version to the new version or create an empty version and add new files to use for the model.

To upload a new version of an environment, follow this workflow.

You can now use a new version of the model or environment in addition to its previous versions. Select the iteration of the model that you want to use from the Version dropdown.

Assign training data

To deploy a new custom model or custom model version, you must add training data:

  1. In Model Registry > Custom Model Workshop, in the Models list, select the model you want to add training data to.

  2. On the Assemble tab, next to Datasets, click Assign.

  3. In the Add Training Data dialog box, click and drag a training dataset file into the Training Data box, or click Choose file and do either of the following:

    • Click Local file, select a file from your local storage, and then click Open.

    • Click AI Catalog, select a training dataset you previously uploaded to DataRobot, and click Use this dataset.

  4. Optional. Specify the column name containing partitioning info for your data (based on training/validation/holdout partitioning). If you plan to deploy the custom model and monitor its data drift and accuracy, specify the holdout partition in the column to establish an accuracy baseline.

    Important

    You can track data drift and accuracy without specifying a partition column; however, in that scenario, DataRobot won't have baseline values. The selected partition column should only include the values T, V, or H.

  5. When the upload is complete, click Add Training Data.

Note

The method for providing training and holdout datasets for unstructured custom inference models requires you to upload the training and holdout datasets separately. Additionally, these datasets cannot include a partition column.

Manage model resources

After creating a custom inference model, you can configure the resources the model consumes to facilitate smooth deployment and minimize potential environment errors in production.

You can monitor a custom model's resource allocation from the Assemble tab. The resource settings are listed below the deployment status.

To edit any resource settings, select the pencil icon (). Note that users can determine the maximum memory allocated for a model, but only organization admins can configure additional resource settings.

Warning

DataRobot recommends configuring resource settings only when necessary. When you configure the Memory setting below, you set the Kubernetes memory "limit" (the maximum allowed memory allocation); however, you can't set the memory "request" (the minimum guaranteed memory allocation). For this reason, it is possible to set the "limit" value too far above the default "request" value. An imbalance between the memory "request" and the memory usage allowed by the increased "limit" can result in the custom model exceeding the memory consumption limit. As a result, you may experience unstable custom model execution due to frequent eviction and relaunching of the custom model. If you require an increased Memory setting, you can mitigate this issue by increasing the "request" at the Organization level; for more information, contact DataRobot Support.

Configure the resource allocations that appear in the modal.

Resource Description
Memory Determines the maximum amount of memory that may be allocated for a custom inference model. If a model exceeds the allocated amount, it is evicted by the system. If this occurs during testing, the test is marked as a failure. If this occurs when the model is deployed, the model is automatically launched again by Kubernetes.
Replicas Sets the number of replicas executed in parallel to balance workloads when a custom model is running. Increasing the number of replicas may not result in better performance, depending on the custom model's speed.

Once you have fully configured the resource settings for a model, click Save. This creates a new version of the custom model with edited resource settings applied.


Updated November 22, 2022
Back to top