Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

MLOps and predictions (V9.1)

July 31, 2023

The DataRobot MLOps v9.1 release includes many new features and capabilities, described below. See additional details of Release 9.1 in the data and modeling and code-first release announcements.

New features and enhancements

Features grouped by capability
Name GA Preview
Predictions and MLOps
Deployment creation workflow redesign
Deployment settings redesign
GitHub Actions for custom models
Assign training data to a custom model version
Prediction monitoring jobs
Apache Spark API for Scoring Code
DataRobot provider for Apache Airflow
Monitoring jobs for custom metrics
Timeliness indicators for predictions and actuals
Versioning support in the Model Registry
Extend compliance documentation with key values
MLflow integration for the Model Registry
Automate deployment and replacement of Scoring Code in AzureML
MLOps reporting for unstructured models

New GA features

Deployment creation workflow redesign

Now generally available, the redesigned deployment creation workflow provides a better organized and more intuitive interface. Regardless of where you create a new deployment (the Leaderboard, the Model Registry, or the Deployments inventory), you are directed to this new workflow. The new design clearly outlines the capabilities of your current deployment based on the data provided, grouping the settings and capabilities logically and providing immediate confirmation when you enable a capability, or guidance when you’re missing required fields or settings. A new sidebar provides details about the model being used to make predictions for your deployment, in addition to information about the deployment review policy, deployment billing details (depending on your organization settings), and a link to the deployment information documentation.

For more information, see the Configure a deployment documentation.

Deployment settings redesign

The new deployment settings workflow enhances the deployment configuration experience by providing the required options for each MLOps feature directly on the deployment tab for that feature. This new organization also provides improved tooltips and additional links to documentation to help you enable the functionality your deployment requires.

The new workflow separates the categories of deployment configuration tasks into dedicated settings on the following tabs:

The Deployment > Settings tab is now deprecated. During the deprecation period, a warning appears on the Settings tab to provide links to the new settings pages:

In addition, on each deployment tab with a Settings page, you can click the setting icon to access the required configuration options:

For more information, see the Deployment settings documentation.

GitHub Actions for custom models

Now generally available, the custom models action manages custom inference models and their associated deployments in DataRobot via GitHub CI/CD workflows. These workflows allow you to create or delete models and deployments and modify settings. Metadata defined in YAML files enables the custom model action's control over models and deployments. Most YAML files for this action can reside in any folder within your custom model's repository. The YAML is searched, collected, and tested against a schema to determine if it contains the entities used in these workflows. For more information, see the custom-models-action repository.

The quickstart example uses a Python Scikit-Learn model template from the datarobot-user-model repository. After you configure the workflow and create a model and a deployment in DataRobot, you can access the commit information from the model's version info and package info and the deployment's overview:

For more information, see GitHub Actions for custom models.

Assign training data to a custom model version

To enable feature drift tracking for a custom model deployment, you must add training data. Currently, when you add training data, you assign it directly to the custom model. As a result, every version of that model uses the same data. In this release, the assignment of training data directly to a custom model is deprecated and scheduled for removal, replaced by the assignment of training data to each custom model version. To support backward compatibility, the deprecated method of training data assignment remains the default during the deprecation period, even for newly created models.

To assign training data to a custom model's versions, you must convert the model. On the Assemble tab, locate the Training data for model versions alert and click Permanently convert:

Warning

Converting a model's training data assignment method is a one-way action. It cannot be reverted. After conversion, you can't assign training data at the model level. This change applies to the UI and the API. If your organization has any automation depending on "per model" training data assignment, before you convert a model, you should update any related automation to support the new workflow. As an alternative, you can create a new custom model to convert to the "per version" training data assignment method and maintain the deprecated "per model" method on the model required for the automation; however, you should update your automation before the deprecation process is complete to avoid gaps in functionality.

After you convert the model, you can assign training data to a custom model version:

  • If the model was already assigned training data, the Datasets section contains information about the existing training dataset. To replace existing training data, click the edit icon (). In the Change Training Data dialog box, click the delete icon () to remove the existing training data, then upload new training data.

  • If the model version doesn't have training data assigned, click Assign, then, in the Add Training Data dialog box, upload training data.

When you create a new custom model version, you can Keep training data from previous version. This setting is enabled by default to bring the training data from the current version to the new custom model version:

For more information, see Add training data to a custom model and Add custom model versions.

Prediction monitoring jobs

Now generally available, monitoring job definitions allow DataRobot to monitor deployments running and storing feature data, predictions, and actuals outside of DataRobot. For example, you can create a monitoring job to connect to Snowflake, fetch raw data from the relevant Snowflake tables, and send the data to DataRobot for monitoring purposes. The GA release of this feature provides a dedicated API for prediction monitoring jobs and the ability to use aggregation for external models with large-scale monitoring enabled:

For more information, see Prediction monitoring jobs.

Apache Spark API for Scoring Code

The Spark API for Scoring Code library integrates DataRobot Scoring Code JARs into Spark clusters. This update makes it easy to use Scoring Code in PySpark and Spark Scala without writing boilerplate code or including additional dependencies in the classpath, while also improving the performance of scoring and data transfer through the API.

This library is available as a PySpark API and a Spark Scala API. In previous versions, the Spark API for Scoring Code consisted of multiple libraries, each supporting a specific Spark version. Now, one library includes all supported Spark versions:

  • The PySpark API for Scoring Code is included in the datarobot-predict Python package, released on PyPI. The PyPI project description contains documentation and usage examples.

  • The Spark Scala API for Scoring Code is published on Maven as scoring-code-spark-api and documented in the API reference.

For more information, see Apache Spark API for Scoring Code.

DataRobot provider for Apache Airflow

Now generally available, you can combine the capabilities of DataRobot MLOps and Apache Airflow to implement a reliable solution for retraining and redeploying your models; for example, you can retrain and redeploy your models on a schedule, on model performance degradation, or using a sensor that triggers the pipeline in the presence of new data. The DataRobot provider for Apache Airflow is a Python package built from source code available in a public GitHub repository and published in PyPi (The Python Package Index). It is also listed in the Astronomer Registry. The integration uses the DataRobot Python API Client, which communicates with DataRobot instances via REST API.

For more information, see the DataRobot provider for Apache Airflow quickstart guide.

New preview features

Monitoring jobs for custom metrics

Now available for preview, monitoring job definitions allow DataRobot to pull calculated custom metric values from outside of DataRobot into the custom metric defined on the Custom Metrics tab, supporting custom metrics with external data sources. For example, you can create a monitoring job to connect to Snowflake, fetch custom metric data from the relevant Snowflake table, and send the data to DataRobot:

Preview documentation.

Feature flag: Enable Custom Metrics Job Definitions

Timeliness indicators for predictions and actuals

Deployments have several statuses to define the general health of a deployment, including Service Health, Data Drift, and Accuracy. These statuses are calculated based on the most recent available data. For deployments relying on batch predictions made in intervals greater than 24 hours, this method can result in an unknown status value on the Prediction Health indicators in the deployment inventory. Now available for preview, those deployment health indicators can retain the most recently calculated health status, presented along with timeliness status indicators to reveal when they are based on old data. You can determine the appropriate timeliness intervals for your deployments on a case-by-case basis. Once you've enabled timeliness tracking on a deployment's Usage > Settings tab, you can view timeliness indicators on the Usage tab and in the Deployments inventory:

View the Predictions Timeliness and Actuals Timeliness columns:

View the Predictions Timeliness and Actuals Timeliness tiles:

Along with the status, you can view the Updated time for each timeliness tile.

Note

In addition to the indicators on the Usage tab and the Deployments inventory, when a timeliness status changes to Red / Failing, a notification is sent through email or the channel configured in your notification policies.

For more information, see the documentation.

Feature flag: Enable Timeliness Stats Indicator for Deployments

Versioning support in the Model Registry

The Model Registry is an organizational hub for various models used in DataRobot, where you can access models as deployment-ready model packages. Now available as a preview feature, the Model Registry > Registered Models page provides an additional layer of organization to your models.

On this page, you can group model packages into registered models, allowing you to categorize them based on the business problem they solve. Registered models can contain:

  • DataRobot, custom, and external models

  • Challenger models (alongside the champion)

  • Automatically retrained models.

Once you add registered models, you can search, filter, and sort them. You can also share your registered models (and the versions they contain) with other users.

Preview documentation.

Feature flag: Enable Versioning Support in the Model Registry

Extend compliance documentation with key values

Now available for preview, you can create key values to reference in compliance documentation templates. Adding a key value reference includes the associated data in the generated template, limiting the manual editing needed to complete the compliance documentation. Key values associated with a model in the Model Registry are key-value pairs containing information about the registered model package:

When you build custom compliance documentation templates, you can include string, numeric, boolean, image, and dataset key values:

Then, when you generate compliance documentation for a model package with a custom template referencing a supported key value, DataRobot inserts the matching values from the associated model package; for example, if the key value has an image attached, that image is inserted.

Preview documentation.

Feature flag: Enable Extended Compliance Documentation

MLflow integration for the DataRobot Model Registry

The preview release of the MLflow integration for DataRobot allows you to export a model from MLflow and import it into the DataRobot Model Registry, creating key values from the training parameters, metrics, tags, and artifacts in the MLflow model. You can use the integration's command line interface to carry out the export and import processes:

Import from MLflow
DR_MODEL_ID="<MODEL_PACKAGE_ID>"

env PYTHONPATH=./ \
python datarobot_mlflow/drflow_cli.py \
  --mlflow-url http://localhost:8080 \
  --mlflow-model cost-model  \
  --mlflow-model-version 2 \
  --dr-model $DR_MODEL_ID \
  --dr-url https://app.datarobot.com \
  --with-artifacts \
  --verbose \
  --action sync

Preview documentation.

Feature flag: Enable Extended Compliance Documentation

Automate deployment and replacement of Scoring Code in AzureML

Now available for preview, you can create a DataRobot-managed AzureML prediction environment to deploy DataRobot Scoring Code in AzureML. With the Managed by DataRobot option enabled, the model deployed externally to AzureML has access to MLOps management, including automatic Scoring Code replacement:

Once you've created an AzureML prediction environment, you can deploy a Scoring Code-enabled model to that environment from the Model Registry:

Preview documentation.

Feature flag: Enable the Automated Deployment and Replacement of Scoring Code in AzureML

MLOps reporting for unstructured models

Now available for preview, you can report MLOps statistics for Python custom inference models created in the Custom Model Workshop with an Unstructured (Regression), Unstructured (Binary), or Unstructured (Multiclass) target type:

With this feature enabled, when you assemble an unstructured custom inference model, you can use new unstructured model reporting methods in your Python code to report deployment statistics and predictions data to MLOps. For an example of an unstructured Python custom model with MLOps reporting, see the DataRobot User Models repository.

Preview documentation.

Feature flag: Enable MLOps Reporting from Unstructured Models

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated May 29, 2024