Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

October 2023

October 25, 2023

With the latest deployment, DataRobot's AI Platform delivered the new GA and Preview features listed below. From the release center, you can also access:

October release

The following table lists each new feature:

Features grouped by capability
Name GA Preview
Data
Speed improvements to Relationship Quality Assessment
Snowflake key pair authentication
AWS S3 connection enhancements
Broader support for Azure Databricks added to Workbench
Modeling
Document AI brings PDF documents as a data source
Prediction Explanations for cluster models now GA
GPU improvements enhance training for deep learning models
SHAP Prediction Explanations now in Workbench
Applications
New app experience in Workbench
Predictions and MLOps
Model package artifact creation workflow
Versioning support in the new Model Registry
Extend compliance documentation with key values
Public network access for custom models
Predictions on training data in Workbench
Custom model deployment status information
Auto-sampling for client side aggregation
New operators for Apache Airflow
Databricks JDBC write-back support for batch predictions
Batch monitoring for deployment predictions
Accuracy for monitoring jobs with aggregation enabled
Notebooks
Schedule notebook jobs
Custom environment images for DataRobot Notebooks

GA

Document AI brings PDF documents as a data source

Available in DataRobot Classic, Document AI is now GA, providing a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Addressing the issues of information spread out in a large corpus and other barriers to efficient use of documents as a data source, Document AI eases data prep and provides insights for PDF-based models.

Prediction Explanations for cluster models now GA

Prediction Explanations with clustering uncover which factors most contributed to any given row’s cluster assignment. Now generally available, this insight helps you to easily explain clustering model outcomes to stakeholders and identify high-impact factors to help focus business strategies.

Functioning very much like multiclass Prediction Explanations—but reporting on clusters instead of classes—cluster explanations are available from both the Leaderboard and deployments. They are available for all XEMP-based clustering projects and are not available with time series.

Model package artifact creation workflow

Now generally available, the improved model package artifact creation workflow provides a clearer and more consistent path to model deployment with visible connections between a model and its associated model packages in the Model Registry. Using this new approach, when you deploy a model, you begin by providing model details and registering the model. Then, after you create the model package and allow the build to complete, you can deploy the model by adding the deployment information.

  1. On the Leaderboard, select the model to use for generating predictions. DataRobot recommends a model with the Recommended for Deployment and Prepared for Deployment badges. Click Predict > Deploy. If the Leaderboard model you select doesn't have the Prepare for Deployment badge, DataRobot recommends you click Prepare for Deployment to run the model preparation process for that model.

  2. On the Deploy model tab, provide the required model package information, and then click Register to deploy.

  3. Allow the model to build. The Building status can take a few minutes, depending on the size of the model. A model package must have a Status of Ready before you can deploy it.

  4. In the Model Packages list, locate the model package you want to deploy and click Deploy.

  5. Add deployment information and create the deployment.

For more information, see the documentation.

Versioning support in the new Model Registry

Now generally available for app.eu.datarobot.com users, the new Model Registry is an organizational hub for the variety of models used in DataRobot. Models are registered as deployment-ready model packages. These model packages are grouped into registered models containing registered model versions, allowing you to categorize them based on the business problem they solve. Registered models can contain DataRobot, custom, external, challenger, and automatically retrained models as versions.

During this update, packages from the Model Registry > Model Packages tab are converted to registered models and migrated to the new Registered Models tab. Each migrated registered model contains a registered model version, and the original packages can be identified in the new tab by the model package ID (registered model version ID) appended to the registered model name.

Once the migration is complete, in the updated Model Registry, you can track the evolution of your predictive and generative models with new versioning functionality and centralized management. In addition, you can access both the original model and any associated deployments and share your registered models (and the versions they contain) with other users.

This update builds on the previous model package workflow changes, requiring the registration of any model you intend to deploy. To register and deploy a model from the Leaderboard, you must first provide model registration details:

  1. On the Leaderboard, select the model to use for generating predictions. DataRobot recommends a model with the Recommended for Deployment and Prepared for Deployment badges. The model preparation process runs feature impact, retrains the model on a reduced feature list, and trains on a higher sample size, followed by the entire sample (latest data for date/time partitioned projects).

  2. Click Predict > Deploy. If the Leaderboard model doesn't have the Prepare for Deployment badge, DataRobot recommends you click Prepare for Deployment to run the model preparation process for that model.

    Tip

    If you've already added the model to the Model Registry, the registered model version appears in the Model Versions list. You can click Deploy next to the model and skip the rest of this process.

  3. Under Deploy model, click Register to deploy.

  4. In the Register new model dialog box, provide the required model package model information:

  5. Click Add to registry. The model opens on the Model Registry > Registered Models tab.

  6. While the registered model builds, click Deploy and then configure the deployment settings.

For more information, see the documentation.

Extend compliance documentation with key values

Now generally available, you can create key values to reference in compliance documentation templates. Adding a key value reference includes the associated data in the generated template, limiting the manual editing needed to complete the compliance documentation. Key values associated with a model in the Model Registry are key-value pairs containing information about the registered model package:

When you build custom compliance documentation templates, you can include string, numeric, boolean, image, and dataset key values:

Then, when you generate compliance documentation for a model package with a custom template referencing a supported key value, DataRobot inserts the matching values from the associated model package; for example, if the key value has an image attached, that image is inserted.

For more information, see the documentation.

Public network access for custom models

Now generally available as a premium feature, you can enable full network access for any custom model. When you create a custom model, you can access any fully qualified domain name (FQDN) in a public network so that the model can leverage third-party services. Alternatively, you can disable public network access if you want to isolate a model from the network and block outgoing traffic to enhance the security of the model. To review this access setting for your custom models, on the Assemble tab, under Resource Settings, check the Network access:

For more information, see the documentation.

Predictions on training data in Workbench

Now generally available in Workbench, after you create an experiment and train models, you can make predictions on training data from Model actions > Make predictions:

When you make predictions on training data, you can select one of the following options, depending on the project type:

Project type Options
AutoML Select one of the following training data options:
  • Validation
  • Holdout
  • All data
OTV/Time Series Select one of the following training data options:
  • All backtests
  • Holdout

In-sample prediction risk

Depending on the option you select and the sample size the model was trained on, predicting on training data can generate in-sample predictions, meaning that the model has seen the target value during training and its predictions do not necessarily generalize well. If DataRobot determines that one or more training rows are used for predictions, the Overfitting risk warning appears. These predictions should not be used to evaluate the model's accuracy.

For more information, see the documentation.

Custom model deployment status information

Now generally available, when you deploy a custom model in DataRobot, deployment status information is surfaced through new badges in the Deployments inventory, warnings in the deployment, and events in the MLOps Logs.

After you add deployment information and deploy a custom model, the Creating deployment modal appears, tracking the status of the deployment creation process, including the application of deployment settings and the calculation of the drift baseline. You can monitor the deployment progress from the modal, allowing you to access the Check deployment's MLOps logs link if an error occurs:

In the Deployments inventory, you can see the following deployment status values in the Deployment Name column:

Status Badge
The custom model deployment process is still in progress. You can't currently make predictions through this deployment or access deployment tabs that require an active deployment.
The custom model deployment process completed with errors. You may be unable to make predictions through this deployment; however, if you deactivate this deployment, you can't reactivate it until you resolve the deployment errors. You should check the MLOps Logs to troubleshoot the custom model deployment.
The custom model deployment process failed, and the deployment is Inactive. You can't currently make predictions through this deployment or access deployment tabs that require an active deployment. You should check the MLOps Logs to troubleshoot the custom model deployment.

From a deployment with an Errored or Warning status, you can access the Service Health MLOps logs link from the warning on any tab. This link takes you directly to the Service Health tab:

On the Service Health tab, under Recent Activity, you can click the MLOps Logs tab to view the Event Details. In the Event Details, you can click View logs to access the custom model deployment logs to diagnose the cause of the error:

Auto-sampling for client side aggregation

Now generally available, large-scale monitoring with the monitoring agent supports the automatic sampling of raw features, predictions, and actuals to support challengers and accuracy tracking. To enable this feature, when configuring large-scale monitoring, define the MLOPS_STATS_AGGREGATION_AUTO_SAMPLING_PERCENTAGE environment variable to determine the percentage of raw data to report to DataRobot using algorithmic sampling. In addition, you must define MLOPS_ASSOCIATION_ID_COLUMN_NAME to identify the column in the input data containing the data for sampling.

For more information, see the documentation.

New operators for Apache Airflow

You can combine the capabilities of DataRobot MLOps and Apache Airflow to implement a reliable solution for retraining and redeploying your models; for example, you can retrain and redeploy your models on a schedule, on model performance degradation, or using a sensor that triggers the pipeline in the presence of new data.

The DataRobot provider for Apache Airflow now includes new operators:

  • StartAutopilotOperator Triggers DataRobot Autopilot to train a set of models.
  • CreateExecutionEnvironmentOperator Creates an execution environment.
  • CreateCustomInferenceModelOperator Creates a custom inference model.
  • GetDeploymentModelOperator Retrieves information about the deployment's current model.

For more information about the new operators, reference the documentation.

Databricks JDBC write-back support for batch predictions

With this release, Databricks is supported as a JDBC data source for batch predictions. For more information on supported data sources for batch predictions, see the documentation.

Speed improvements to Relationship Quality Assessment

Now generally available for SaaS users, to improve Relationship Quality Assessment run times, DataRobot subsamples approximately 10% of the primary dataset, speeding up the computation without impacting the enrichment rate estimation accuracy or the results of the assessment. After the assessment is done, the sampling percentage is included at the top of the report.

Snowflake key pair authentication

Now generally available, create a Snowflake data connection in DataRobot Classic and Workbench using the key pair authentication method—a Snowflake username and private key—as an alternative to basic and OAuth authentication. This also allows you to share secure configurations for key pair authentication.

New app experience in Workbench

Now generally available, DataRobot introduces a new, streamlined application experience in Workbench that provides you with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:

  • Applications have a new, simplified interface and creation workflow to make the experience more intuitive.
  • Application creation automatically generates insights, like Feature Impact and ROC Curve, based on the model powering your application.
  • Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.

Preview

GPU improvements enhance training for deep learning models

This deployment brings several enhancements to the preview GPU feature, including:

  • Additional blueprints are now available for GPU training—MiniLM, Roberta, and TinyBERT featurizers are now available.

  • Depending on the project:

    • Keras Text Convolutional Neural Network blueprints may train during Quick Autopilot.
    • Image Finetuner blueprints may train during full Autopilot.
  • GPU and CPU variants are now available in the repository, allowing a choice of which worker type to train on.

  • GPU variant blueprints are optimized to train faster on GPU workers.

Preview documentation

Feature flag OFF by default: Enable GPU Workers

SHAP Prediction Explanations now in Workbench

SHAP Prediction Explanations estimate how much each feature contributes to a given prediction, reported as its difference from the average. They are intuitive, unbounded (computed for all features), fast, and, due to the open source nature of SHAP, transparent. With this deployment, SHAP explanations are supported in Workbench for all non-time series experiments. Accessed from the Model overview tab, SHAP explanations provide a preview for a general "intuition" of model performance with an option to view explanations for the entire dataset.

Preview documentation

Feature flag ON by default: SHAP in Workbench

Broader support for Azure Databricks added to Workbench

Now available for preview, the following support for Azure Databricks has been added to Workbench:

  • Data added via a connection is added as a dynamic dataset.
  • View data in a live preview sampled directly from the source data in Azure Databricks.
  • Perform wrangling on Azure Databricks datasets.
  • Materialize published wrangling recipes in the Data Registry as well as Azure Databricks.

Preview documentation.

Feature flags:

  • Enable Databricks Driver
  • Enable Databricks Wrangling
  • Enable Databricks In-Source Materialization in Workbench
  • Enable Dynamic Datasets in Workbench

AWS S3 connection enhancements

A new AWS S3 connector is now available for preview, providing several performance enhancements as well as support for temporary credentials and parquet file ingest.

Preview documentation.

Feature flag: Enable S3 Connector

Batch monitoring for deployment predictions

Now available for preview, you can view monitoring statistics organized by batch, instead of by time. With batch-enabled deployments, you can access the Predictions > Batch Management tab, where you can create and manage batches. You can then add predictions to those batches and view service health, data drift, accuracy, and custom metric statistics by batch in your deployment. To create batches and assign predictions to a batch, you can use the UI or the API. In addition, each time a batch prediction or scheduled batch prediction job runs, a batch is created automatically, and every prediction from the job is added to that batch.

Feature flags OFF by default: Enable Deployment Batch Monitoring, Enable Batch Custom Metrics for Deployments

Preview documentation.

Accuracy for monitoring jobs with aggregation enabled

Now available for preview, monitoring jobs for external models with aggregation enabled can support accuracy tracking. Enable Use aggregation and configure the retention settings, indicating that data is aggregated by the MLOps library and defining how much raw data should be retained for challengers and accuracy analysis; then, to report the Actuals value column for accuracy monitoring, define the Predictions column and Association ID column.

Feature flag OFF by default: Enable Accuracy Aggregation

For more information, see the documentation.

Schedule notebook jobs

Now available for preview, you can automate your code-based workflows by scheduling notebooks to run on a schedule in non-interactive mode. Notebook scheduling is managed by notebook jobs that you can create directly from the DataRobot Notebooks interface. Additionally, you can parameterize a notebook job to enhance the automation experience enabled by notebook scheduling. By defining certain values in a notebook as parameters, you can provide inputs for those parameters when a notebook job runs instead of having to continuously modify the notebook itself to change the values for each run.

Preview documentation.

Feature flag OFF by default: Enable Notebooks Scheduling

Custom environment images for DataRobot Notebooks

Now available for preview, you can integrate DataRobot Notebooks with DataRobot custom environments that define reusable and custom Docker images used to run notebook sessions. You can create a custom environment to use for your notebook sessions if you want full control over the environment, and to leverage reproducible dependencies beyond those available in the built-in images. Compatible custom environments are selectable directly from the notebook interface. DataRobot Notebooks support Python and R custom environments.

Preview documentation.

Feature flag OFF by default: Enable Notebooks Custom Environments

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated April 16, 2024