Create custom inference models¶
Custom inference models are user-created, pretrained models that you can upload to DataRobot (as a collection of files) via the Custom Model Workshop. You can then upload a model artifact to create, test, and deploy custom inference models to DataRobot's centralized deployment hub.
You can assemble custom inference models in any of the following ways:
Create a custom model without providing web server Scoring Code and a
start_server.shfile. This type of custom model must use a drop-in environment. Drop-in environments contain the web server Scoring Code and a
start_server.shfile used by the model. They are provided by DataRobot in the Workshop. You can also create your own drop-in custom environment.
Convert a legacy model.
Be sure to review the guidelines for preparing a custom model folder before proceeding. If any files overlap between the custom model and the environment folders, the model's files will take priority.
Once a custom model's file contents are assembled, you can test the contents locally for development purposes before uploading it to DataRobot. After you create a custom model in the Workshop, you can run a testing suite from the Assemble tab.
Create a new custom model¶
To create a custom model, navigate to Model Registry > Custom Model Workshop and select the Models tab. This tab lists the models you have created. Click Add new model.
In the Add Custom Inference Model window, enter the fields described in the table below:
Element Description Model name Name the custom model. Target type / Target name Select the target type (binary classification, regression, multiclass, anomaly detection, or unstructured) and enter the name of the target feature. Positive class label / Negative class label These fields only display for binary classification models. Specify the value to be used as the positive class label and the value to be used as the negative class label.
Click Show Optional Fields and, if necessary, enter a prediction threshold, the coding language used to build the model, and a description.
After completing the fields, click Add Custom Model.
In the Assemble tab, under Model Environment on the right, select a model environment by clicking the Base Environment dropdown menu on the right and selecting an environment. The model environment is used for testing and deploying the custom model.
Under Model on the left, add content by dragging and dropping files or browsing. Alternatively, select a remote integrated repository.
If you click Browse local file, you have the option of adding a Local Folder. The local folder is for dependent files and additional assets required by your model, not the model itself. Even if the model file is included in the folder, it will not be accessible to DataRobot unless the file exists at the root level. The root file can then point to the dependencies in the folder.
You must also upload web server Scoring Code and a
start_server.shfile to your model's folder unless you are pairing the model with a drop-in environment.
After adding your model content, you can:
You can create custom inference models that support anomaly detection problems. If you choose to build one, reference the DRUM template. (Log in to GitHub before clicking this link.) When deploying custom inference anomaly detection models, note that the following functionality is not supported:
- Data drift
- Accuracy and association IDs
- Challenger models
- Humility rules
- Prediction intervals
Custom models can contain various machine learning libraries in the model code, but not every drop-in environment provided by DataRobot natively supports all libraries. However, you can manage these dependencies from the Workshop and update the base drop-in environments to support your model code.
To manage model dependencies, you must include a
requirements.txt file uploaded as part of your custom model. The text file must indicate the machine learning libraries used in the model code.
For example, consider a custom R model that uses Caret and XGBoost libraries. If this model is added to the Workshop and the R drop-in environment is selected, the base environment will only support Caret, not XGBoost. To address this, edit
requirements.txt to include the Caret and XGBoost dependencies. After editing and re-uploading the requirements file, the base environment can be installed with XGBoost, making the model available within the environment.
List the following, depending on the model type, in
For R models, list the machine learning library dependencies.
For Python models, list the dependencies and any version constraints for the libraries. Supported constraint types include
>, and multiple constraints can be issued in a single entry (for example,
pandas >= 0.24, < 1.0).
Once the requirements file is updated to include dependencies and constraints, navigate to your custom model's Assemble tab. Upload the file under the Model > Content header. The Model Dependencies field updates to display the dependencies and constraints listed in the file.
From the Assemble tab, select a base drop-in environment under the Model Environment header. DataRobot warns you that a new environment must be built to account for the model dependencies. Select Build environment, and DataRobot installs the required libraries and constraints to the base environment.
Once the base environment is updated, your custom model will be usable with the environment, allowing you to test, deploy, or register it.
Add new versions¶
If you want to update a model due to new package versions, different preprocessing steps, hyperparameters, and more, you can update the file contents to create a new version of the model and/or environment.
To do so, select the model from the Workshop and navigate to the Assemble tab. Under the Model header, select Add Files. Upload the files or folders that you updated.
When you update the individual contents of a model, the minor version (1.1, 1.2, etc.) of the model automatically updates.
You can create a new major version of a model (1.0, 2.0, etc.) by selecting New Version. Choose to copy the contents of a previous version to the new version or create an empty version and add new files to use for the model.
To upload a new version of an environment, follow this workflow.
You can now use a new version of the model or environment in addition to its previous versions. Select the iteration of the model that you want to use from the Version dropdown.
Assign training data¶
If you want to add training data to a custom inference model (which allows you to deploy it), you can do so by selecting a custom model and navigating to the Model Info tab.
The Model Info tab lists custom inference model attributes. Click Add Training Data.
A pop-up appears, prompting you to upload training data.
Click Browse to upload training data. Optionally, you can specify the column name containing the partitioning information for your data (based on training/validation/holdout partitioning). If you plan to deploy the custom model and monitor its accuracy, specify the holdout partition in the column to establish an accuracy baseline. You can still track accuracy without specifying a partition column; however, there will be no accuracy baseline. When the upload is complete, click Add Training Data.
The method for providing training and holdout datasets for unstructured custom inference models requires you to upload the training and holdout datasets separately. Additionally, these datasets cannot include a partition column.
Manage model resources¶
After creating a custom inference model, you can configure the resources the model consumes to facilitate smooth deployment and minimize potential environment errors in production.
You can monitor a custom model's resource allocation from the Assemble tab. The resource settings are listed below the deployment status.
To edit any resource settings, select the pencil icon (). Note that users can determine the maximum memory allocated for a model, but only organization admins can configure additional resource settings.
DataRobot recommends configuring resource settings only when necessary. When you configure the Memory setting below, you set the Kubernetes memory "limit" (the maximum allowed memory allocation); however, you can't set the memory "request" (the minimum guaranteed memory allocation). For this reason, it is possible to set the "limit" value too far above the default "request" value. An imbalance between the memory "request" and the memory usage allowed by the increased "limit" can result in the custom model exceeding the memory consumption limit. As a result, you may experience unstable custom model execution due to frequent eviction and relaunching of the custom model. If you require an increased Memory setting, you can mitigate this issue by increasing the "request" at the Organization level; for more information, contact DataRobot Support.
Configure the resource allocations that appear in the modal.
|Memory||Determines the maximum amount of memory that may be allocated for a custom inference model. If a model exceeds the allocated amount, it is evicted by the system. If this occurs during testing, the test is marked as a failure. If this occurs when the model is deployed, the model is automatically launched again by Kubernetes.|
|Replicas||Sets the number of replicas executed in parallel to balance workloads when a custom model is running. Increasing the number of replicas may not result in better performance, depending on the custom model's speed.|
Once you have fully configured the resource settings for a model, click Save. This creates a new version of the custom model with edited resource settings applied.
Convert legacy models¶
To convert legacy models:
Create a custom inference model. When DataRobot finishes creating and registering the model, the model's Assemble tab opens.
Select a base environment (right-hand pane) where the task will run. To convert a legacy model, select the [DataRobot] Legacy Code Environment from the Base Environment dropdown.
From the Model inventory (left-hand pane), add the legacy modeling data and programs as your modeling content. You can pull files from a repository or add local files to assemble your model package.
When the file upload completes, if the model is not open source, DataRobot prompts to convert the file. Select the legacy content's main program file and click Convert model to convert the file to a Java artifact optimized for model hosting and model serving.
Once converted, or if you encounter errors, click View Logs for a detailed log of the conversion. If you did encounter and determine the source of errors, once they are corrected, and new files are made available to DataRobot, the Convert model button once again becomes active.
You can embed extra logic directly into the logs to assist in auditability for model risk or governance teams.
When the model conversion is successful, and before you can test the model, you must select the output dataset—the final dataset the legacy model used to make predictions. In the Model pane, use the dropdown to identify Output Dataset and Output Column (target feature). These selections will be used for output prediction verification during Custom Model Testing.