Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Data, modeling, and apps (V9.1)

July 31, 2023

The DataRobot v9.1.0 release includes many new data, modeling, and apps, as well as admin, features and enhancements, described below. See additional details of Release 9.1 in the MLOps and code-first release announcements.

In the spotlight

Foundational Models for Text AI

With this deployment, DataRobot brings foundational models for Text AI to general availability. Foundational models—large AI models trained on a vast quantity of unlabeled data at scale—provide extra accuracy and diversity and allow you to leverage large pre-trained deep learning methods for Text AI.

While DataRobot has already implemented some foundational models, such as TinyBERT, those models operate at the word-level, causing additional computing (converting rows of text requires computing the embeddings for each token and then averaging their vectors). These new models—Sentence RoBERTa for English and MiniLM for multilingual use cases—can be adapted to a wide range of downstream tasks. These two foundational models are available in pre-built blueprints in the repository or they can be added to any blueprint via blueprint customization (via embeddings) to leverage these foundational models and improve accuracy.

The new blueprints are available in the Repository.

Workbench now generally available

With this month’s deployment, Workbench, the DataRobot experimentation platform, moves from preview to general availability. Workbench provides an intuitive, guided, machine learning workflow, helping you to experiment and iterate, as well as providing a frictionless collaboration environment. In addition to becoming GA, other preview features are introduced this month:

See the capability matrix for an evolving comparison of capabilities available in Workbench and DataRobot Classic.

Release v9.1 provides updated UI string translations for the following languages:

  • Japanese
  • French
  • Spanish
  • Korean
Features grouped by capability
Name GA Preview
Data
Share secure configurations
Fast Registration in the AI Catalog
Workbench adds new operations to data wrangling capabilities
Improvements to wrangling preview
Data connection browsing improvements
Publish recipes with smart downsampling
Materialize wrangled datasets in Snowflake
BigQuery support added to Workbench
BigQuery connection enhancements
Improvements to data preparation in Workbench
Snowflake key pair authentication
Perform joins and aggregations on your data in Workbench
Modeling
Foundational Models for Text AI Sentence Featurizers
Reduced feature lists restored in Quick Autopilot mode
Backend date/time functionality simplification
Workbench expands validation/partitioning settings in experiment set up
Slices in Workbench
Slices for time-aware projects (Classic)
Document AI brings PDF documents
Blueprint repository and Blueprint visualization
GPU support for deep learning
Sklearn library upgrades
Apps
Details page added to time series Predictor applications
Build Streamlit applications for DataRobot models
New app experience in Workbench
Prefilled application templates
Improvements to the new app experience in Workbench
Admin
Custom role-based access control (RBAC)
Improved organization and account resource hierarchy

See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality. This document also describes DataRobot's fixed issues.

Data enhancements

GA

Share secure configurations

IT admins can now configure OAuth-based authentication parameters for a data connection, and then securely share them with other users without exposing sensitive fields. This allows users to easily connect to their data warehouse without needing to reach out to IT for data connection parameters.

For more information, see the full documentation.

Fast Registration in the AI Catalog

Now generally available, you can quickly register large datasets in the AI Catalog by specifying the first N rows to be used for registration instead of the full dataset—giving you faster access to data to use for testing and Feature Discovery.

In the AI Catalog, click Add to catalog and select your data source. Fast registration is only available when adding a dataset from a new data connection, an existing data connection, or a URL.

For more information, see Configure Fast Registration.

Preview

Workbench adds new operations to data wrangling capabilities

With this release, three new operations have been added to DataRobot’s wrangling capabilities in Workbench:

  1. De-duplicate rows: Automatically remove all duplicate rows from your dataset.

  2. Rename features: Quickly change the name of one or more features in your dataset.

  3. Remove features: Remove one or more features from your dataset.

To access new and existing operations, register data from Snowflake to a Workbench Use Case and then click Wrangle. When you publish the recipe, the operations are then applied to the source data in Snowflake to materialize an output dataset.

Required feature flag: No flag required

See the Workbench preview documentation.

Data connection browsing improvements

This release introduces improvements to the data connection browsing experience in Workbench:

  • If a Snowflake database is not specified during configuration, you can browse and select a database after saving your configuration. Otherwise, you are brought directly to the schema list view.

  • DataRobot has reduced the time it takes to display results when browsing for databases, schemas, and tables in Snowflake.

Improvements to wrangling preview

This release includes several improvements for data wrangling in Workbench:

  • Introducing reorder operations in your wrangling recipe.

  • If the addition of an operation results in an error, use the new Undo button to revert your changes.

  • The live preview now features infinite scroll for seamless browsing for up to 1000 columns.

Publish recipes with smart downsampling

When publishing a wrangling recipe in Workbench, use smart downsampling to reduce the size of your output dataset and optimize model training. Smart downsampling is a data science technique to reduce the time it takes to fit a model without sacrificing accuracy. This downsampling technique accounts for class imbalance by stratifying the sample by class. In most cases, the entire minority class is preserved and sampling only applies to the majority class. This is particularly useful for imbalanced data. Because accuracy is typically more important on the minority class, this technique greatly reduces the size of the training dataset. This reduces modeling time and cost, while preserving model accuracy.

Feature flag: Enable Smart Downsampling in Wrangle Publishing Settings

Materialize wrangled datasets in Snowflake

You can now publish wrangling recipes to materialize data in DataRobot’s Data Registry or Snowflake. When you publish a wrangling recipe, operations are pushed down into Snowflake virtual warehouse, allowing you to leverage the security, compliance, and financial controls of Snowflake. By default, the output dataset is materialized in DataRobot's Data Registry. Now when you can materialize the wrangled dataset in Snowflake databases and schemas for which you have write access.

Preview documentation.

Feature flags: Enable Snowflake In-Source Materialization in Workbench, Enable Dynamic Datasets in Workbench

BigQuery support added to Workbench

Support for Google BigQuery has been added to Workbench, allowing you to:

Feature flag: Enable Native BigQuery Driver

BigQuery connection enhancements

A new BigQuery connector is now available for preview, providing several performance and compatibility enhancements, as well as support for authentication using Service Account credentials.

Preview documentation.

Feature flag: Enable Native BigQuery Driver

Improvements to data preparation in Workbench

This release introduces several improvements to the data preparation experience in Workbench.

Workbench now supports dynamic datasets.

  • Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
  • Dynamic datasets added via a connection will be available for selection in the Data Registry.
  • DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.

Feature flag: Enable Dynamic Datasets in Workbench

Snowflake key pair authentication

Now available for preview, you can create a Snowflake data connection in DataRobot Classic and Workbench using the key pair authentication method—a Snowflake username and private key—as an alternative to basic authentication.

Required feature flag: Enable Snowflake Key-pair Authentication

Perform joins and aggregations on your data in Workbench

You can now add Join and Aggregation operations to your wrangling recipe in Workbench. Use the Join operation to combine datasets that are accessible via the same connection instance, and the Aggregation operation to apply aggregation functions like sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations to features in your dataset.

Preview documentation.

Feature flag: Enables Additional Wrangler Operations

Modeling features

GA

Foundational Models for Text AI

With this deployment, DataRobot brings foundational models for Text AI to general availability. Foundational models—large AI models trained on a vast quantity of unlabeled data at scale—provide extra accuracy and diversity and allow you to leverage large pre-trained deep learning methods for Text AI.

While DataRobot has already implemented some foundational models, such as TinyBERT, those models operate at the word-level, causing additional computing (converting rows of text requires computing the embeddings for each token and then averaging their vectors). These new models—Sentence RoBERTa for English and MiniLM for multilingual use cases—can be adapted to a wide range of downstream tasks. These two foundational models are available in pre-built blueprints in the repository or they can be added to any blueprint via blueprint customization (via embeddings) to leverage these foundational models and improve accuracy.

The new blueprints are available in the Repository.

Reduced feature lists restored in Quick Autopilot mode

With this release, Quick mode now reintroduces creating a reduced feature list when preparing a model for deployment. In January, DataRobot made Quick mode enhancements for AutoML; in February, the improvement was made available for time series projects. At that time, DataRobot stopped automatically generating and fitting the DR Reduced Features list, as fitting required retraining models. Now, based on user requests, when recommending and preparing a model for deployment, DataRobot once again creates the reduced feature list. The process, however, does not include model fitting. To apply the list to the recommended model—or any Leaderboard model—you can manually retrain it.

Backend date/time functionality simplification

With this release, the mechanisms that support date/time partitioning have been simplified to provide greater flexibility by relaxing certain guardrails and streamlining the backend logic. While there are no specific user-facing changes, you may notice:

  • When the default partitioning does not have enough rows, DataRobot automatically expands the validation duration (the portion of data leading up to the beginning of the training partition that is reserved for feature derivation).

  • DataRobot automatically disables holdout when there are insufficient rows to cover both validation and holdout.

  • DataRobot includes the forecast window when reserving data for feature derivation before the start of the training partition in all cases. Previously this was only applied to multiseries or wide forecast windows.

Sklearn library upgrades

In this release, the sklearn library was upgraded from 0.15.1 to 0.24.2. The impacts are summarized as follows:

  • Feature association insights: Updated the spectral clustering logic. This only affects the cluster ID (a numeric identifier for each cluster, e.g., 0, 1, 2, 3). The values of feature association insights are not affected.

  • AUC/ROC insights: Due to the improvement in sklearn ROC curve calculation, the precision of AUC/ROC values is slightly affected.

Preview

Workbench expands validation/partitioning settings in experiment set up

Workbench now supports the ability to set and define the validation type when setting up an experiment. With the addition of training-validation-holdout (TVH), users can experiment with building models on more data without impacting run time to maximize accuracy.

Required feature flag: No flag required

Slices in Workbench

Data slices, the capability that allows you to configure filters that create subpopulations of project data, is now available in select Workbench insights. From the Data slice dropdown, you can select a slice or access the modal for creating new filters.

Required feature flag: Slices in Workbench

Slices for time-aware projects (Classic)

Now available for preview, DataRobot brings the creation and application of data slices to time aware (OTV and time series) projects in DataRobot Classic. Sliced insights provide the option to view a subpopulation of a model's derived data based on feature values. Viewing and comparing insights based on segments of a project’s data helps to understand how models perform on different subpopulations. Use the segment-based accuracy information gleaned from sliced insights, or compare the segments to the "global" slice (all data), to improve training data, create individual models per segment, or augment predictions post-deployment.

Required feature flag: Sliced Insights for Time Aware Projects

Document AI brings PDF documents as a data source

Document AI provides a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Until Document AI, data preparation requirements presented a challenging barrier to efficient use of documents as a data source, even making them inaccessible—information spread out in a large corpus, a variety of formats with inconsistencies. Not only does Document AI ease the data prep aspect of working with documents, but DataRobot brings its automation to projects that rely on documents as the data source, including comparing models on the Leaderboard, model explainability, and access to a full repository of blueprints.

With two new user-selectable tasks added to the model blueprint, DataRobot can now extract embedded text (with the Document Text Extractor task) or text of scans (with the Tesseract OCR task) and then use PDF text for model building. DataRobot automatically chooses a task type based on the project but allows you the flexibility to modify that task if desired. Document AI works with many project types, including regression, binary and multiclass classification, multilabel, clustering, and anomaly detection, but also provides multimodal support for text, images, numerical, categorical, etc., within a single blueprint.

To help you see and understand the unique nature of a document's text elements, DataRobot introduces the Document Insights visualization. It is useful for double-checking which information DataRobot extracted from the document and whether you selected the correct task:

Support of document types has been added to several other data and model visualizations as well.

Required feature flags: Enable Document Ingest, Enable OCR for Document Ingest

Blueprint repository and Blueprint visualization

With this deployment, Workbench introduces the blueprint repository—a library of modeling blueprints. After running Quick Autopilot, you can visit the repository to select blueprints that DataRobot did not run by default. After choosing a feature list and sample size (or training period for time-aware), DataRobot will then build the blueprints and add the resulting model(s) to the Leaderboard and your experiment.

Additionally, the Blueprint visualization is now available. The Blueprint tab provides a graphical representation of the preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model.

GPU support for deep learning

Support for deep learning models, Large Language Models for example, are increasingly important in an expanding number of business use cases. While some of the models can be run on CPUs, other models require GPUs to achieve reasonable training time. To efficiently train, host, and predict using these "heavier" deep learning models, DataRobot leverages Nvidia GPUs within the application. When GPU support is enabled, DataRobot detects blueprints that contain certain tasks and potentially uses GPU workers to train them. That is, if the sample size minimum is not met, the blueprint is routed to the CPU queue. Additionally, a heuristic determines which blueprints will train with low runtime on CPU workers.

Required feature flag: Enable GPU Workers

Apps

GA

Details page added to time series Predictor applications

In the Time Series Forecasting widget, you can now view prediction information for specific predictions or dates, allowing you to not only see the prediction values, but also compare them to other predictions that were made for the same date.

To drill down into the prediction details, click on a prediction in either the Predictions vs Actuals or Prediction Explanations chart. This opens the Forecast details page, which displays the following information:

Description
1 The average prediction value in the forecast window.
2 Up to 10 Prediction Explanations for each prediction.
3 Segmented analysis for each forecast distance within the forecast window.
4 Prediction Explanations for each forecast distance included in the segmented analysis.

documentation.

Preview

Build Streamlit applications for DataRobot models

You can now build Streamlit applications using DataRobot models, allowing you to easily incorporate DataRobot insights into your Streamlit dashboard.

For information on what’s included and setup, see the dr-streamlit Github repository.

Improvements to the new app experience in Workbench

This release introduces the following improvements to the new application experience (available for preview) in Workbench:

  • The Overview folder now displays the blueprint of the model used to create the application.
  • Alpine Light was added to the available app themes.

Preview documentation.

Feature flag: Enable New No-Code AI Apps Edit Mode

Prefilled application templates

Previously, when you created a new application, the application opened to a blank template with limited guidance on how to begin building and generating predictions. Now, applications are populated after creation using training data to help highlight, showcase, and collaborate on the output of your models immediately.

Required feature flag: Enable Prefill NCA Templates with Training Data

Preview documentation.

New app experience in Workbench

Now available for preview, DataRobot introduces a new, streamlined application experience in Workbench that provides leadership teams, COE teams, business users, data scientists, and more with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:

  • Applications have a new, simplified interface to make the experience more intuitive.
  • You can access model insights, including Feature Impact and Feature Effects, from all new Workbench apps.
  • Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.

Required feature flag: Enable New No-Code AI Apps Edit Mode

Recommended feature flag: Enable Prefill NCA Templates with Training Data

Preview documentation.

Admin enhancements

Custom role-based access control (RBAC)

Now generally available, custom RBAC is a solution for organizations with use cases that are not addressed by default roles in DataRobot. Administrators can create roles and define access at a more granular level, and assign them to users and groups.

You can access custom RBAC from User Settings > User Roles, which lists each available role an admin can assign to a user in their organization, including DataRobot default roles.

For more information, see the full documentation.

Improved organization and account resource hierarchy

For enterprise users, a number of improvements to account organization have been introduced in version 9.1:

Existing users without an organization have been automatically moved to the Default Organization.

User groups that are not part of an organization have been moved to the Default Organization.

For clusters that have configured SAML or LDAP identity providers, users are now created within the configured Default Organization if no organization mapping is defined for these users (via SAML or LDAP configuration).

As a system admin, when creating users, the Default Organization will now be populated by default within the dropdown for the Create User page.

For clusters with “multi-tenancy privacy” enabled, users with Project Admin role who do not belong to an organization may lose access to some projects if owned by any organization other than the Default Organization. When the user is moved to the Default Organization, they will only be able to access projects within this org.

Deprecation notices

Feature Fit removed from the API

Feature Fit has been removed from DataRobot's API. DataRobot recommends using Feature Effects instead, as it provides the same output.

Customer-reported fixed issues

The following issues have been fixed since release 9.0.4.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated August 26, 2024