Data, modeling, and apps (V9.1)¶
July 31, 2023
The DataRobot v9.1.0 release includes many new data, modeling, and apps, as well as admin, features and enhancements, described below. See additional details of Release 9.1 in the MLOps and code-first release announcements.
In the spotlight¶
Foundational Models for Text AI¶
With this deployment, DataRobot brings foundational models for Text AI to general availability. Foundational models—large AI models trained on a vast quantity of unlabeled data at scale—provide extra accuracy and diversity and allow you to leverage large pre-trained deep learning methods for Text AI.
While DataRobot has already implemented some foundational models, such as TinyBERT, those models operate at the word-level, causing additional computing (converting rows of text requires computing the embeddings for each token and then averaging their vectors). These new models—Sentence RoBERTa for English and MiniLM for multilingual use cases—can be adapted to a wide range of downstream tasks. These two foundational models are available in pre-built blueprints in the repository or they can be added to any blueprint via blueprint customization (via embeddings) to leverage these foundational models and improve accuracy.
The new blueprints are available in the Repository.
Workbench now generally available¶
With this month’s deployment, Workbench, the DataRobot experimentation platform, moves from preview to general availability. Workbench provides an intuitive, guided, machine learning workflow, helping you to experiment and iterate, as well as providing a frictionless collaboration environment. In addition to becoming GA, other preview features are introduced this month:
See the capability matrix for an evolving comparison of capabilities available in Workbench and DataRobot Classic.
Release v9.1 provides updated UI string translations for the following languages:
- Japanese
- French
- Spanish
- Korean
Features grouped by capability
See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality. This document also describes DataRobot's fixed issues.
Data enhancements¶
GA¶
Share secure configurations¶
IT admins can now configure OAuth-based authentication parameters for a data connection, and then securely share them with other users without exposing sensitive fields. This allows users to easily connect to their data warehouse without needing to reach out to IT for data connection parameters.
For more information, see the full documentation.
Fast Registration in the AI Catalog¶
Now generally available, you can quickly register large datasets in the AI Catalog by specifying the first N rows to be used for registration instead of the full dataset—giving you faster access to data to use for testing and Feature Discovery.
In the AI Catalog, click Add to catalog and select your data source. Fast registration is only available when adding a dataset from a new data connection, an existing data connection, or a URL.
For more information, see Configure Fast Registration.
Preview¶
Workbench adds new operations to data wrangling capabilities¶
With this release, three new operations have been added to DataRobot’s wrangling capabilities in Workbench:
-
De-duplicate rows: Automatically remove all duplicate rows from your dataset.
-
Rename features: Quickly change the name of one or more features in your dataset.
-
Remove features: Remove one or more features from your dataset.
To access new and existing operations, register data from Snowflake to a Workbench Use Case and then click Wrangle. When you publish the recipe, the operations are then applied to the source data in Snowflake to materialize an output dataset.
Required feature flag: No flag required
See the Workbench preview documentation.
Data connection browsing improvements¶
This release introduces improvements to the data connection browsing experience in Workbench:
-
If a Snowflake database is not specified during configuration, you can browse and select a database after saving your configuration. Otherwise, you are brought directly to the schema list view.
-
DataRobot has reduced the time it takes to display results when browsing for databases, schemas, and tables in Snowflake.
Improvements to wrangling preview¶
This release includes several improvements for data wrangling in Workbench:
-
Introducing reorder operations in your wrangling recipe.
-
If the addition of an operation results in an error, use the new Undo button to revert your changes.
-
The live preview now features infinite scroll for seamless browsing for up to 1000 columns.
Publish recipes with smart downsampling¶
When publishing a wrangling recipe in Workbench, use smart downsampling to reduce the size of your output dataset and optimize model training. Smart downsampling is a data science technique to reduce the time it takes to fit a model without sacrificing accuracy. This downsampling technique accounts for class imbalance by stratifying the sample by class. In most cases, the entire minority class is preserved and sampling only applies to the majority class. This is particularly useful for imbalanced data. Because accuracy is typically more important on the minority class, this technique greatly reduces the size of the training dataset. This reduces modeling time and cost, while preserving model accuracy.
Feature flag: Enable Smart Downsampling in Wrangle Publishing Settings
Materialize wrangled datasets in Snowflake¶
You can now publish wrangling recipes to materialize data in DataRobot’s Data Registry or Snowflake. When you publish a wrangling recipe, operations are pushed down into Snowflake virtual warehouse, allowing you to leverage the security, compliance, and financial controls of Snowflake. By default, the output dataset is materialized in DataRobot's Data Registry. Now when you can materialize the wrangled dataset in Snowflake databases and schemas for which you have write access.
Preview documentation.
Feature flags: Enable Snowflake In-Source Materialization in Workbench, Enable Dynamic Datasets in Workbench
BigQuery support added to Workbench¶
Support for Google BigQuery has been added to Workbench, allowing you to:
- Create and configure data connections.
- Add BigQuery datasets to a Use Case.
- Wrangle BigQuery datasets, and then publish recipes to BigQuery to materialize the output in the Data Registry.
Feature flag: Enable Native BigQuery Driver
BigQuery connection enhancements¶
A new BigQuery connector is now available for preview, providing several performance and compatibility enhancements, as well as support for authentication using Service Account credentials.
Preview documentation.
Feature flag: Enable Native BigQuery Driver
Improvements to data preparation in Workbench¶
This release introduces several improvements to the data preparation experience in Workbench.
Workbench now supports dynamic datasets.
- Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
- Dynamic datasets added via a connection will be available for selection in the Data Registry.
- DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.
Feature flag: Enable Dynamic Datasets in Workbench
Snowflake key pair authentication¶
Now available for preview, you can create a Snowflake data connection in DataRobot Classic and Workbench using the key pair authentication method—a Snowflake username and private key—as an alternative to basic authentication.
Required feature flag: Enable Snowflake Key-pair Authentication
Perform joins and aggregations on your data in Workbench¶
You can now add Join and Aggregation operations to your wrangling recipe in Workbench. Use the Join operation to combine datasets that are accessible via the same connection instance, and the Aggregation operation to apply aggregation functions like sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations to features in your dataset.
Preview documentation.
Feature flag: Enables Additional Wrangler Operations
Modeling features¶
GA¶
Foundational Models for Text AI¶
With this deployment, DataRobot brings foundational models for Text AI to general availability. Foundational models—large AI models trained on a vast quantity of unlabeled data at scale—provide extra accuracy and diversity and allow you to leverage large pre-trained deep learning methods for Text AI.
While DataRobot has already implemented some foundational models, such as TinyBERT, those models operate at the word-level, causing additional computing (converting rows of text requires computing the embeddings for each token and then averaging their vectors). These new models—Sentence RoBERTa for English and MiniLM for multilingual use cases—can be adapted to a wide range of downstream tasks. These two foundational models are available in pre-built blueprints in the repository or they can be added to any blueprint via blueprint customization (via embeddings) to leverage these foundational models and improve accuracy.
The new blueprints are available in the Repository.
Reduced feature lists restored in Quick Autopilot mode¶
With this release, Quick mode now reintroduces creating a reduced feature list when preparing a model for deployment. In January, DataRobot made Quick mode enhancements for AutoML; in February, the improvement was made available for time series projects. At that time, DataRobot stopped automatically generating and fitting the DR Reduced Features list, as fitting required retraining models. Now, based on user requests, when recommending and preparing a model for deployment, DataRobot once again creates the reduced feature list. The process, however, does not include model fitting. To apply the list to the recommended model—or any Leaderboard model—you can manually retrain it.
Backend date/time functionality simplification¶
With this release, the mechanisms that support date/time partitioning have been simplified to provide greater flexibility by relaxing certain guardrails and streamlining the backend logic. While there are no specific user-facing changes, you may notice:
-
When the default partitioning does not have enough rows, DataRobot automatically expands the validation duration (the portion of data leading up to the beginning of the training partition that is reserved for feature derivation).
-
DataRobot automatically disables holdout when there are insufficient rows to cover both validation and holdout.
-
DataRobot includes the forecast window when reserving data for feature derivation before the start of the training partition in all cases. Previously this was only applied to multiseries or wide forecast windows.
Sklearn library upgrades¶
In this release, the sklearn library was upgraded from 0.15.1 to 0.24.2. The impacts are summarized as follows:
-
Feature association insights: Updated the spectral clustering logic. This only affects the cluster ID (a numeric identifier for each cluster, e.g., 0, 1, 2, 3). The values of feature association insights are not affected.
-
AUC/ROC insights: Due to the improvement in sklearn ROC curve calculation, the precision of AUC/ROC values is slightly affected.
Preview¶
Workbench expands validation/partitioning settings in experiment set up¶
Workbench now supports the ability to set and define the validation type when setting up an experiment. With the addition of training-validation-holdout (TVH), users can experiment with building models on more data without impacting run time to maximize accuracy.
Required feature flag: No flag required
Slices in Workbench¶
Data slices, the capability that allows you to configure filters that create subpopulations of project data, is now available in select Workbench insights. From the Data slice dropdown, you can select a slice or access the modal for creating new filters.
Required feature flag: Slices in Workbench
Slices for time-aware projects (Classic)¶
Now available for preview, DataRobot brings the creation and application of data slices to time aware (OTV and time series) projects in DataRobot Classic. Sliced insights provide the option to view a subpopulation of a model's derived data based on feature values. Viewing and comparing insights based on segments of a project’s data helps to understand how models perform on different subpopulations. Use the segment-based accuracy information gleaned from sliced insights, or compare the segments to the "global" slice (all data), to improve training data, create individual models per segment, or augment predictions post-deployment.
Required feature flag: Sliced Insights for Time Aware Projects
Document AI brings PDF documents as a data source¶
Document AI provides a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Until Document AI, data preparation requirements presented a challenging barrier to efficient use of documents as a data source, even making them inaccessible—information spread out in a large corpus, a variety of formats with inconsistencies. Not only does Document AI ease the data prep aspect of working with documents, but DataRobot brings its automation to projects that rely on documents as the data source, including comparing models on the Leaderboard, model explainability, and access to a full repository of blueprints.
With two new user-selectable tasks added to the model blueprint, DataRobot can now extract embedded text (with the Document Text Extractor task) or text of scans (with the Tesseract OCR task) and then use PDF text for model building. DataRobot automatically chooses a task type based on the project but allows you the flexibility to modify that task if desired. Document AI works with many project types, including regression, binary and multiclass classification, multilabel, clustering, and anomaly detection, but also provides multimodal support for text, images, numerical, categorical, etc., within a single blueprint.
To help you see and understand the unique nature of a document's text elements, DataRobot introduces the Document Insights visualization. It is useful for double-checking which information DataRobot extracted from the document and whether you selected the correct task:
Support of document
types has been added to several other data and model visualizations as well.
Required feature flags: Enable Document Ingest, Enable OCR for Document Ingest
Blueprint repository and Blueprint visualization¶
With this deployment, Workbench introduces the blueprint repository—a library of modeling blueprints. After running Quick Autopilot, you can visit the repository to select blueprints that DataRobot did not run by default. After choosing a feature list and sample size (or training period for time-aware), DataRobot will then build the blueprints and add the resulting model(s) to the Leaderboard and your experiment.
Additionally, the Blueprint visualization is now available. The Blueprint tab provides a graphical representation of the preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model.
GPU support for deep learning¶
Support for deep learning models, Large Language Models for example, are increasingly important in an expanding number of business use cases. While some of the models can be run on CPUs, other models require GPUs to achieve reasonable training time. To efficiently train, host, and predict using these "heavier" deep learning models, DataRobot leverages Nvidia GPUs within the application. When GPU support is enabled, DataRobot detects blueprints that contain certain tasks and potentially uses GPU workers to train them. That is, if the sample size minimum is not met, the blueprint is routed to the CPU queue. Additionally, a heuristic determines which blueprints will train with low runtime on CPU workers.
Required feature flag: Enable GPU Workers
Apps¶
GA¶
Details page added to time series Predictor applications¶
In the Time Series Forecasting widget, you can now view prediction information for specific predictions or dates, allowing you to not only see the prediction values, but also compare them to other predictions that were made for the same date.
To drill down into the prediction details, click on a prediction in either the Predictions vs Actuals or Prediction Explanations chart. This opens the Forecast details page, which displays the following information:
Description | |
---|---|
1 | The average prediction value in the forecast window. |
2 | Up to 10 Prediction Explanations for each prediction. |
3 | Segmented analysis for each forecast distance within the forecast window. |
4 | Prediction Explanations for each forecast distance included in the segmented analysis. |
Preview¶
Build Streamlit applications for DataRobot models¶
You can now build Streamlit applications using DataRobot models, allowing you to easily incorporate DataRobot insights into your Streamlit dashboard.
For information on what’s included and setup, see the dr-streamlit Github repository.
Improvements to the new app experience in Workbench¶
This release introduces the following improvements to the new application experience (available for preview) in Workbench:
- The Overview folder now displays the blueprint of the model used to create the application.
- Alpine Light was added to the available app themes.
Preview documentation.
Feature flag: Enable New No-Code AI Apps Edit Mode
Prefilled application templates¶
Previously, when you created a new application, the application opened to a blank template with limited guidance on how to begin building and generating predictions. Now, applications are populated after creation using training data to help highlight, showcase, and collaborate on the output of your models immediately.
Required feature flag: Enable Prefill NCA Templates with Training Data
Preview documentation.
New app experience in Workbench¶
Now available for preview, DataRobot introduces a new, streamlined application experience in Workbench that provides leadership teams, COE teams, business users, data scientists, and more with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:
- Applications have a new, simplified interface to make the experience more intuitive.
- You can access model insights, including Feature Impact and Feature Effects, from all new Workbench apps.
- Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.
Required feature flag: Enable New No-Code AI Apps Edit Mode
Recommended feature flag: Enable Prefill NCA Templates with Training Data
Preview documentation.
Admin enhancements¶
Custom role-based access control (RBAC)¶
Now generally available, custom RBAC is a solution for organizations with use cases that are not addressed by default roles in DataRobot. Administrators can create roles and define access at a more granular level, and assign them to users and groups.
You can access custom RBAC from User Settings > User Roles, which lists each available role an admin can assign to a user in their organization, including DataRobot default roles.
For more information, see the full documentation.
Improved organization and account resource hierarchy¶
For enterprise users, a number of improvements to account organization have been introduced in version 9.1:
Existing users without an organization have been automatically moved to the Default Organization.
User groups that are not part of an organization have been moved to the Default Organization.
For clusters that have configured SAML or LDAP identity providers, users are now created within the configured Default Organization if no organization mapping is defined for these users (via SAML or LDAP configuration).
As a system admin, when creating users, the Default Organization will now be populated by default within the dropdown for the Create User page.
For clusters with “multi-tenancy privacy” enabled, users with Project Admin role who do not belong to an organization may lose access to some projects if owned by any organization other than the Default Organization. When the user is moved to the Default Organization, they will only be able to access projects within this org.
Deprecation notices¶
Feature Fit removed from the API¶
Feature Fit has been removed from DataRobot's API. DataRobot recommends using Feature Effects instead, as it provides the same output.
Customer-reported fixed issues¶
The following issues have been fixed since release 9.0.4.
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.