Data, modeling, and apps (V9.2)¶
November 22, 2023
The DataRobot v9.2.0 release includes many new data, modeling, and apps, as well as admin, features and enhancements, described below. See additional details of Release 9.2 in the MLOps and code-first release announcements.
In the spotlight¶
Compare models across experiments from a single view¶
Solving a business problem with Machine Learning is an iterative process and involves running many experiments to test ideas and confirm assumptions. To simplify the iteration process, Workbench introduces model Comparison—a tool that allows you to compare up to three models, side-by-side, from any number of experiments within a single Use Case. Now, instead of having to look at each experiment individually and record metrics for later comparison, you can compare models across experiments in a single view.
The comparison Leaderboard is accessible from any project in Workbench. It can be filtered to more easily locate and select models, compare models across different insights, and view and compare metadata for the selected models. The Comparison tab is a preview feature, on by default.
The video below provides a very quick overview of the comparison functionality.
Feature flag ON by default: Enable Use Case Leaderboard Compare
Preview documentation.
9.2 release¶
Release v9.2 provides updated UI string translations for the following languages:
- Japanese
- French
- Spanish
- Korean
Features grouped by capability
See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality. This document also describes DataRobot's fixed issues.
Data enhancements¶
GA¶
Data connection browsing improvements¶
This release introduces improvements to the data connection browsing experience in Workbench:
If a Snowflake, BigQuery, or Databricks data source is not specified during configuration, you can browse and select a dataset after saving your configuration. Otherwise, you are brought directly to the schema list view.
Snowflake key pair authentication¶
Now generally available, create a Snowflake data connection in DataRobot Classic and Workbench using the key pair authentication method—a Snowflake username and private key—as an alternative to basic and OAuth authentication. This also allows you to share secure configurations for key pair authentication.
Disable Elasticsearch in the AI Catalog¶
If you are experiencing performance issues or unexpected behavior when searching for assets in the AI Catalog, try disabling Elasticsearch.
Feature flag: Disable ElasticSearch For AI Catalog Search
Preview¶
Materialize Workbench datasets in Google BigQuery¶
Now available for Preview, you can materialize wrangled datasets in the Data Registry as well as BigQuery. To enable this option, wrangle a BigQuery dataset in Workbench, click Publish, and select Publish to BigQuery in the Publishing Settings modal.
Note that you must establish a new connection to BigQuery to use this feature.
Preview documentation.
Additional support and connection enhancements for Databricks¶
Now available for preview, the following support for Databricks has been added to DataRobot:
- Create and configure data connections.
- Data added via a connection is added as a dynamic dataset.
- View data in a live preview sampled directly from the source data in Databricks.
- Perform wrangling on Databricks datasets.
- Materialize published wrangling recipes in the Data Registry as well as Databricks.
Preview documentation.
Feature flags:
- Enable Databricks Driver
- Enable Databricks Wrangling
- Enable Databricks In-Source Materialization in Workbench
- Enable Dynamic Datasets in Workbench
AWS S3 connection enhancements¶
A new AWS S3 connector is now available for preview, providing several performance enhancements as well as support for temporary credentials and Parquet file ingest.
Preview documentation.
Feature flag: Enable S3 Connector
Modeling features¶
GA¶
Document AI brings PDF documents as a data source¶
Available in DataRobot Classic, Document AI is now GA, providing a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Addressing the issues of information spread out in a large corpus and other barriers to efficient use of documents as a data source, Document AI eases data prep and provides insights for PDF-based models.
Period Accuracy now available in Workbench and DataRobot Classic¶
Period Accuracy is an insight that lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. It is now generally available for all time series projects. In DataRobot Classic, the feature can be found in the Evaluate > Period Accuracy tab. For Workbench, find the insight under Experiment information. The insight is also available for time-aware experiments
Prediction Explanations for cluster models now GA¶
Prediction Explanations with clustering uncover which factors most contributed to any given row’s cluster assignment. Now generally available, this insight helps you to easily explain clustering model outcomes to stakeholders and identify high-impact factors to help focus business strategies.
Functioning very much like multiclass Prediction Explanations—but reporting on clusters instead of classes—cluster explanations are available from both the Leaderboard and deployments. They are available for all XEMP-based clustering projects and are not available with time series.
Blueprint repository in Workbench now GA¶
With this release, the blueprint repository—a library of modeling blueprints—is now generally available in Workbench for prediction and time-aware projects. After running Quick Autopilot, you can visit the repository to select and run blueprints that DataRobot did not run by default. They will be added to the Leaderboard and your experiment.
Additionally, the Blueprint visualization is GA in Workbench, providing a graphical representation of the preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model.
Blueprint JSON endpoints allow mapping to open source¶
With this release, model blueprint JSON representation can be retrieved through both the UI and through API and client packages for improved transparency. Now you can access the JSON for DataRobot tasks and map the components to open-source code, creating an open-source equivalent to the DataRobot blueprint. For code-first users, the information can be quickly retrieved programmatically and incorporated into notebooks. Or, it can be copied from the Describe > Blueprint JSON tab in the UI. The code then be edited to suit your pipeline needs.
More granular model logging info now available in DataRobot Classic¶
With this release, additional detail has been added to the Model Info and Log tabs, both available under Describe in DataRobot Classic. The Log tab, which displays the status of successful and errored operations, now displays start and end times for each task within a larger job. Model Info has added Max RAM and Cache Time Savings—a measure of how much time was saved due to reusing blueprint vertices.
Support for Google Kubernetes Engine in DataRobot¶
With DataRobot release version 9.2, DataRobot now supports Google Kubernetes Engine (GKE) (https://cloud.google.com/kubernetes-engine?hl=en) clusters.
Library upgrades¶
DataRobot release version 9.2 introduces the following library upgrades:
- Tensorflow 2.7.4 → 2.11.1
- Python 3.7 - > 3.10
- Joblib 0.17.0 → 1.3.2
Extended Kubernetes support¶
DataRobot release version 9.2 supports Kubernetes 1.26 and 1.27 across Amazon EKS, Azure AKS, and Google GKE. It also supports OpenShift 4.13 (Kubernetes 1.26).
Preview¶
Workbench time-aware capabilities expanded to include time series modeling¶
With this deployment, DataRobot users can now use date/time partitioning to build time series-based experiments. Support for time series setup, modeling, and insights extend date/time partitioning, bringing forecasting capabilities to Workbench. With a significantly more streamlined workflow, including a simple window settings modal with graphic visualization, Workbench users can easily set up time series experiments.
After modeling, all time series insights will be available, as well as experiment summary data that provides a backtest summary and partitioning log. Additionally:
With feature lists and dataset views, you can see the results of feature extraction and reduction.
Because Quick mode trains only the most crucial blueprints, you can build more niche or long-running time series models, manually, from the blueprint repository.
Preview documentation to learn how to create, evaluate, and train new models.
Feature flags ON by default:
- Enable Date/Time Partitioning (OTV) in Workbench
- Enable Workbench for Time Series Projects
Leaderboard Data and Feature List tabs added to Workbench¶
This deployment brings the addition of two new tabs to the experiment info displayed on the Leaderboard:
-
The Data tab provides summary analytics of the data used in the project.
-
The Feature lists tab lists feature lists built for the experiment and available for model training.
Feature flag ON by default: UXR Leaderboard Data and Feature Lists
Preview documentation.
SHAP Prediction Explanations now in Workbench¶
SHAP Prediction Explanations estimate how much each feature contributes to a given prediction, reported as its difference from the average. They are intuitive, unbounded (computed for all features), fast, and, due to the open-source nature of SHAP, transparent. With this deployment, SHAP explanations are supported in Workbench for all non-time series experiments. Accessed from the Model overview tab, SHAP explanations provide a preview for a general "intuition" of model performance with an option to view explanations for the entire dataset.
Preview documentation.
Feature flag ON by default: SHAP in Workbench
GPU improvements enhance training for deep learning models¶
This deployment brings several enhancements to the preview GPU feature, including:
-
Additional blueprints are now available for GPU training—MiniLM, RoBERTa, and TinyBERT featurizers are now available.
-
Depending on the project:
- Keras Text Convolutional Neural Network blueprints may train during Quick Autopilot.
- Image Finetuner blueprints may train during full Autopilot.
-
GPU and CPU variants are now available in the repository, allowing a choice of which worker type to train on.
-
GPU variant blueprints are optimized to train faster on GPU workers.
Preview documentation.
Feature flag OFF by default: Enable GPU Workers
Apps¶
GA¶
New app experience in Workbench¶
Now generally available, DataRobot introduces a new, streamlined application experience in Workbench that provides you with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:
- Applications have a new, simplified interface and creation workflow to make the experience more intuitive.
- Application creation automatically generates insights, like Feature Impact and ROC Curve, based on the model powering your application.
- Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.