Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

June 2023

June 28, 2023

With the latest deployment, DataRobot's AI Platform delivered the new GA and preview features listed below. From the release center you can also access:

In the spotlight

Foundational Models for Text AI

With this deployment, DataRobot brings foundational models for Text AI to general availability. Foundational models—large AI models trained on a vast quantity of unlabeled data at scale—provide extra accuracy and diversity and allow you to leverage large pre-trained deep learning methods for Text AI.

While DataRobot has already implemented some foundational models, such as TinyBERT, those models operate at the word-level, causing additional computing (converting rows of text requires computing the embeddings for each token and then averaging their vectors). These new model—Sentence Roberta for English and MiniLM for multilingual use cases—can be adapted to a wide range of downstream tasks. These two foundational models are available in pre-built blueprints in the repository or can be added to any blueprint via blueprint customization (via embeddings) to generate leverage these foundational models and improve accuracy.

The new blueprints are available in the Repository.

Workbench now generally available

With this month’s deployment, Workbench, the DataRobot experimentation platform, moves from preview to general availability. Workbench provides an intuitive, guided, machine learning workflow, helping you to experiment and iterate, as well as providing a frictionless collaboration environment. In addition to becoming GA, other preview features are introduced this month:

See the capability matrix for an evolving comparison of capabilities available in Workbench and DataRobot Classic.

June release

The following table lists each new feature:

Features grouped by capability
Name GA Preview
Admin
Custom role-based access control (RBAC)
Applications
New app experience in Workbench
Prefilled application templates
Build Streamlit applications for DataRobot models
Data
Share secure configurations
New driver versions
Modeling
Foundational Models for Text AI Sentence Featurizers
Tune hyperparameters for custom tasks
Expanded data slice support and new features in GA release
Improvements to XEMP Prediction Explanation calculations
Document AI brings PDF documents
GPU support for deep learning
Blueprint repository and Blueprint visualization
Slices in Workbench
Slices for time-aware projects (Classic)
Notebooks
DataRobot Notebooks
Predictions and MLOps
GitHub Actions for custom models
Prediction monitoring jobs
Spark API for Scoring Code
Extend compliance documentation with key values
API
DataRobotX

GA

Share secure configurations

IT admins can now configure OAuth-based authentication parameters for a data connection, and then securely share them with other users without exposing sensitive fields. This allows users to easily connect to their data warehouse without needing to reach out to IT for data connection parameters.

For more information, see the full documentation.

Custom role-based access control (RBAC)

Now generally available, custom RBAC is a solution for organizations with use cases that are not addressed by default roles in DataRobot. Administrators can create roles and define access at a more granular level, and assign them to users and groups.

You can access custom RBAC from User Settings > User Roles, which lists each available role an admin can assign to a user in their organization, including DataRobot default roles.

For more information, see the full documentation.

New driver versions

With this release, the following driver versions have been updated:

  • MySQL==8.0.32
  • Microsoft SQL Server==12.2.0
  • Snowflake==3.13.29

See the complete list of supported driver versions in DataRobot.

GitHub Actions for custom models

Now generally available, the custom models action manages custom inference models and their associated deployments in DataRobot via GitHub CI/CD workflows. These workflows allow you to create or delete models and deployments and modify settings. Metadata defined in YAML files enables the custom model action's control over models and deployments. Most YAML files for this action can reside in any folder within your custom model's repository. The YAML is searched, collected, and tested against a schema to determine if it contains the entities used in these workflows. For more information, see the custom-models-action repository.

The quickstart example uses a Python Scikit-Learn model template from the datarobot-user-model repository. After you configure the workflow and create a model and a deployment in DataRobot, you can access the commit information from the model's version info and package info and the deployment's overview:

For more information, see GitHub Actions for custom models.

Prediction monitoring jobs

Now generally available, monitoring job definitions allow DataRobot to monitor deployments running and storing feature data, predictions, and actuals outside of DataRobot. For example, you can create a monitoring job to connect to Snowflake, fetch raw data from the relevant Snowflake tables, and send the data to DataRobot for monitoring purposes. The GA release of this feature provides a dedicated API for prediction monitoring jobs and the ability to use aggregation for external models with large-scale monitoring enabled:

For more information, see Prediction monitoring jobs.

Spark API for Scoring Code

The Spark API for Scoring Code library integrates DataRobot Scoring Code JARs into Spark clusters. This update makes it easy to use Scoring Code in PySpark and Spark Scala without writing boilerplate code or including additional dependencies in the classpath, while also improving the performance of scoring and data transfer through the API.

This library is available as a PySpark API and a Spark Scala API. In previous versions, the Spark API for Scoring Code consisted of multiple libraries, each supporting a specific Spark version. Now, one library includes all supported Spark versions:

  • The PySpark API for Scoring Code is included in the datarobot-predict Python package, released on PyPI. The PyPI project description contains documentation and usage examples.

  • The Spark Scala API for Scoring Code is published on Maven as scoring-code-spark-api and documented in the API reference.

For more information, see Apache Spark API for Scoring Code.

DataRobot Notebooks

Now generally available, DataRobot includes an in-browser editor to create and execute notebooks for data science analysis and modeling. Notebooks display computation results in various formats, including text, images, graphs, plots, tables, and more. You can customize the output display by using open-source plugins. Cells can also contain Markdown rich text for commentary and explanation of the coding workflow. As you develop and edit a notebook, DataRobot stores a history of revisions that you can return to at any time.

DataRobot Notebooks offer a dashboard that hosts notebook creation, upload, and management. Individual notebooks have containerized, built-in environments with commonly used machine learning libraries that you can easily set up in a few clicks. Notebook environments seamlessly integrate with DataRobot's API, allowing a robust coding experience supported by keyboard shortcuts for cell functions, in-line documentation, and saved environment variables for secrets management and automatic authentication.

Expanded data slice support and new features in GA release

Data slices allow you to define filters for categorical, numeric, or both types of features. Viewing and comparing insights based on segments of a project’s data helps to understand how models perform on different subpopulations by configuring filters that choose feature and set operators and values to narrow the data returned. As part of the general availability release, several improvements were made:

  • Feature Effects now supports slices.
  • A quick-compute option replaces the sample size modal for setting sample size in Feature Impact.
  • Manual initiation of slice calculation starts with slice validation and prevents accidental launching of computations.

Improvements to XEMP Prediction Explanation calculations

An additional benefit of the Pandas library upgrade from version 0.23.4 to 1.3.5 in May is an improvement to the way DataRobot calculates XEMP Prediction Explanations. With the new libraries, calculation differences, due to accuracy improvements in the newer version of Pandas, result in accuracy improvements in the insight.

Preview

Document AI brings PDF documents as a data source

Document AI provides a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Until Document AI, data preparation requirements presented a challenging barrier to efficient use of documents as a data source, even making them inaccessible—information spread out in a large corpus, a variety of formats with inconsistencies. Not only does Document AI ease the data prep aspect of working with documents, but DataRobot brings its automation to projects that rely on documents as the data source, including comparing models on the Leaderboard, model explainability, and access to a full repository of blueprints.

With two new user-selectable tasks added to the model blueprint, DataRobot can now extract embedded text (with the Document Text Extractor task) or text of scans (with the Tesseract OCR task) and then use PDF text for model building. DataRobot automatically chooses a task type based on the project but allows you the flexibility to modify that task if desired. Document AI works with many project types, including regression, binary and multiclass classification, multilabel, clustering, and anomaly detection, but also provides multimodal support for text, images, numerical, categorical, etc., within a single blueprint.

To help you see and understand the unique nature of a document's text elements, DataRobot introduces the Document Insights visualization. It is useful for double-checking which information DataRobot extracted from the document and whether you selected the correct task:

Support of document types has been added to several other data and model visualizations as well.

Required feature flags: Enable Document Ingest, Enable OCR for Document Ingest

GPU support for deep learning

Support for deep learning models, Large Language Models for example, are increasingly important in an expanding number of business use cases. While some of the models can be run on CPUs, other models require GPUs to achieve reasonable training time. To efficiently train, host, and predict using these "heavier" deep learning models, DataRobot leverages Nvidia GPUs within the application. When GPU support is enabled, DataRobot detects blueprints that contain certain tasks and potentially uses GPU workers to train them. That is, if the sample size minimum is not met, the blueprint is routed to the CPU queue. Additionally, a heuristic determines which blueprints will train with low runtime on CPU workers.

Required feature flag: Enable GPU Workers

Preview documentation.

Blueprint repository and Blueprint visualization

With this deployment, Workbench introduces the blueprint repository—a library of modeling blueprints. After running Quick Autopilot, you can visit the repository to select blueprints that DataRobot did not run by default. After choosing a feature list and sample size (or training period for time-aware), DataRobot will then build the blueprints and add the resulting model(s) to the Leaderboard and your experiment.

Additionally, the Blueprint visualization is now available. The Blueprint tab provides a graphical representation of the preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model.

Slices in Workbench

Data slices, the capability that allows you to configure filters that create subpopulations of project data, is now available in select Workbench insights. From the Data slice dropdown you can select a slice or access the modal for creating new filters.

Required feature flag: Slices in Workbench

Prefilled application templates

Previously, when you created a new application, the application opened to a blank template with limited guidance on how to begin building and generating predictions. Now, applications are populated after creation using training data to help highlight, showcase, and collaborate on the output of your models immediately.

Required feature flag: Enable Prefill NCA Templates with Training Data

Preview documentation.

New app experience in Workbench

Now available for preview, DataRobot introduces a new, streamlined application experience in Workbench that provides leadership teams, COE teams, business users, data scientists, and more with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:

  • Applications have a new, simplified interface to make the experience more intuitive.
  • You can access model insights, including Feature Impact and Feature Effects, from all new Workbench apps.
  • Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.

Required feature flag: Enable New No-Code AI Apps Edit Mode

Recommended feature flag: Enable Prefill NCA Templates with Training Data

Preview documentation.

Slices for time-aware projects (Classic)

Now available for preview, DataRobot brings the creation and application of data slices to time aware (OTV and time series) projects in DataRobot Classic. Sliced insights provide the option to view a subpopulation of a model's derived data based on feature values. Viewing and comparing insights based on segments of a project’s data helps to understand how models perform on different subpopulations. Use the segment-based accuracy information gleaned from sliced insights, or compare the segments to the "global" slice (all data), to improve training data, create individual models per segment, or augment predictions post-deployment.

Required feature flag: Sliced Insights for Time Aware Projects

Extend compliance documentation with key values

Now available for preview, you can create key values to reference in compliance documentation templates. Adding a key value reference includes the associated data in the generated template, limiting the manual editing needed to complete the compliance documentation. Key values associated with a model in the Model Registry are key-value pairs containing information about the registered model package:

When you build custom compliance documentation templates, you can include string, numeric, boolean, image, and dataset key values:

Then, when you generate compliance documentation for a model package with a custom template referencing a supported key value, DataRobot inserts the matching values from the associated model package; for example, if the key value has an image attached, that image is inserted.

Required feature flag: Enable Extended Compliance Documentation

For more information, see the full documentation.

Tune hyperparameters for custom tasks

You can now tune hyperparameters for custom tasks. You can provide two values for each hyperparameter: the name and type. The type can be one of int, float, string, select, or multi, and all types support a default value. See Model metadata and validation schema for more details and example configuration of hyperparameters.

Preview documentation.

Build Streamlit applications for DataRobot models

You can now build Streamlit applications using DataRobot models, allowing to easily incorporate DataRobot insights into your Streamlit dashboard.

For information on what’s included and setup, see the dr-streamlit Github repository.

API

DataRobotX

Now available for preview, DataRobotX, or DRX, is a collection of DataRobot extensions designed to enhance your data science experience. DRX provides a streamlined experience for common workflows but also offers new, experimental high-level abstractions.

DRX offers unique experimental workflows, including the following:

  • Smart downsampling with Pyspark
  • Enrich datasets using LLMs
  • Feature importance rank ensembling (FIRE)
  • Deploy custom models
  • Track experiments in MLFlow

Preview documentation.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated May 29, 2024