Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

AutoML (V7.2)

September 13, 2021

The DataRobot v7.2.0 release includes many new AutoML features and enhancements described in this section. See also the new features described in the time series (AutoTS) and MLOps release notes.

See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality. This document also describes DataRobot's fixed issues.

In the spotlight...

The following features are some of the highlights of Release 7.2:

User interface enhancements

New login experience

This release introduces a new login experience for DataRobot platform application users. The new page is redesigned to convey the level of innovation and technical revolution this company and product are offering without affecting the existing log in workflow.

ROC Curve redesign

The ROC Curve tab has been redesigned to streamline the model evaluation strategies you can perform. Along with the Prediction Distribution graph, ROC curve, confusion matrix, and a summary of metrics, you can now generate profit curves, precision-recall curves, and custom charts in the ROC Curve tab.

For details, see ROC Curve.

New location for tools to share and edit project names

To improve navigation, this release brings a new home for the project sharing and project name editing tools. While still available from the project control center (Manage Projects), you can now more quickly access the tools directly from the project dropdown.

Data enhancements

New Spark version for improved performance

Release 7.2 upgrades the Spark version used for Feature Discovery and Spark SQL to Spark 3.0. In addition to Spark performance improvements, the upgrade brings improved JDBC compatibility with the AI Catalog (which uses Java 11) and a smaller shippable codebase. DataRobot now supports all drivers that are compatible with any Java version 8 or later.

Connect to Snowflake and Google BigQuery using OAuth

Snowflake and Google BigQuery users can now set up a data connection using OAuth single sign-on. Once configured, you can read data from production databases to use for model building and predictions. For details, see Data connection with OAuth.

Feature Discovery features

Feature Discovery Relationship Editor setup guide

With Feature Discovery, DataRobot generates new features from multiple datasets so that you don’t need to perform feature engineering manually to consolidate multiple datasets. Use the Relationship Editor to join the datasets to prepare for Feature Discovery.

The Relationship Editor setup guide is a new intermediate screen that displays when you click the Add datasets button on the EDA (Data) page. It walks you through the process of specifying prediction points for time-aware features and adding the datasets to be joined for Feature Discovery. For details, see Create a Feature Discovery project.

Feature Discovery engineering controls

Feature Discovery engineering controls, now publicly available, let you influence how DataRobot conducts feature engineering.

You can enable specific controls to use your domain knowledge to guide feature engineering or to improve accuracy. You might want to exclude specific transformations that slow down processing or are difficult to explain to stakeholders. For details, see Set feature engineering controls.

Feature Discovery settings enhanced

The Feature Discovery tab on the Data page provides dataset relationship details, a feature derivation summary, and a feature derivation log. You can now see the number of secondary datasets, explored features, and derived features that resulted from Feature Discovery. Click Show more to see which feature engineering controls were used during Feature Discovery and to learn about each.

For details, see Define relationships.

Categorical Statistics feature type

Categorical Statistics let you explore numeric statistics like sum, max, and average for each category of a categorical feature. In the following example, during Feature Discovery, DataRobot explores Spending numeric statistics for each category of the Product-Type feature:

  • Spending(30 days min)
  • Spending(30 days min by Product_Type = A)
  • Spending(30 days min by Product_Type = B)
  • Spending(30 days min by Product_Type = C) ..

Categorical Statistics aggregation is turned off by default. You can enable it on the Feature Engineering tab of the Feature Discovery Settings page. For details, see Categorical Statistics.

Modeling features

Purpose-built AI applications with the AI App Builder

The AI App Builder, available from the Applications tab, provides a no-code platform to enable core DataRobot services (making predictions, optimizing outcomes, simulating scenarios, and more) without having to build models and evaluate their performance in DataRobot.

Each application starts with a template and data source—either a deployment or dataset in the AI Catalog. However, the App Builder lets you configure additional widgets, custom features, and pages to tailor the application to a specific use case.

Once deployed, applications can be easily shared and do not require users to own full DataRobot licenses in order to use them, offering a great solution for broadening your organization’s ability to use DataRobot’s functionality.

Widgets

Applications are composed of widgets that create visual, interactive, and purpose-driven end-user applications. There are two types of widgets—chart widgets and header widgets. Chart widgets add visualizations to an application and can be configured to surface important insights in your data and prediction results. Header widgets provide additional filtering options for your application.

What-if and Optimizer widget

The What-if and Optimizer widget provides two tools for interacting with prediction results:

  • What-if: A decision-support tool that allows you to create and compare multiple prediction simulations to identify the option that provides the best outcome. You can also make predictions, then change one or more inputs to create a new simulation, and see how those changes affect the target feature.
  • Optimizer: Identifies the maximum or minimum predicted value for a target by varying the values of a selection of flexible features in the model.

Word Cloud blueprints for multiclass projects

An improvement has been made so that all Stochastic Gradient Descent (SGD) blueprints create a Word Cloud if even a single text feature is present in a multiclass project. Previously, there was a specialized SGD blueprint, available from the Repository, that had to be run manually. Access the new visualizations from either the model’s Describe > Word Cloud or Insights > Word Cloud tabs.

New Keras DeepCTR models available in the Repository

To support data scientists with CTR data (categoricals with high cardinality), DataRobot introduces three DeepCTR models, available from the Repository. These models—neural factorization machine, autoint, and deep cross network—can be particularly useful when building clickthrough rate or recommendation models.

Bias and Fairness improvements

With this release, DataRobot has upgraded the user experience for calculating Bias and Fairness for your models. The first improvement allows you to enable Bias and Fairness insights after modeling has already started. Select a model and navigate to Bias and Fairness > Settings. Once configured, Bias and Fairness insights are enabled for every model on the Leaderboard.

The second improvement is the ability to view multiple fairness metrics in the Per-Class Bias page. This functionality allows you to view fairness scores for all five fairness metrics using a dropdown menu.

For details, see the Bias and Fairness documentation.

TLS options for Portable Prediction Server

By default, the Portable Prediction Server (PPS) serves predictions over an insecure listener on an :8080 port (clear text HTTP over TCP). You can now also serve predictions over a secure listener on :8443 port (HTTP over TLS/SSL, or simply HTTPS). When the secure listener is enabled, the insecure listener becomes unavailable. The configuration is accomplished using environment variables, which are described in the documentation along with accompanying examples.

Preview features

ROC Curve redesign

The ROC Curve tab has been redesigned to streamline the model evaluation strategies you can perform. Along with the Prediction Distribution graph, ROC curve, confusion matrix, and a summary of metrics, you can now generate profit curves, precision-recall curves, and custom charts in the ROC Curve tab.

Create feature lists in the Relationship Editor

The ability to create feature lists in the Feature Discovery Relationship Editor is now available as a preview feature.

Once you create your feature list, you can transform the features directly in the Relationship Editor.

Preview documentation

Further enhancements to multilabel modeling

In response to user feedback, the multilabel modeling preview feature introduces several usability improvements, including:

  • Addition of Feature Effects visualization for multilabel projects
  • Increased speed of per-label metric
  • New per-label Word Clouds
  • Ability to easily pin labels
  • Model packages and access to the Portable Prediction Server for MLOps

GA documentation (as of 7.3)

Insights available for external models

Through the External Predictions advanced option tab, you can bring external model(s) into the DataRobot AutoML environment, view them on the Leaderboard, and run a subset of DataRobot's evaluative insights for comparison against DataRobot models. Simply add external model predictions as a new column in your training dataset, identify the predictions and partition column, and press Start. The external model becomes available on the Leaderboard, which you can then compare against DataRobot models, investigate further using select DataRobot visualizations, and (for binary classification projects) explore bias testing.

GA documentation (as of 7.3)

Deprecation notices

Note the following to better plan for later migration to new releases.

Hadoop deployment and scoring deprecated

Hadoop deployment and scoring, including the Standalone Scoring Engine (SSE), will be unavailable and fully deprecated (end-of-life) starting with release v7.3 (December 13th, 2021 for Cloud users). Post-deprecation, Hadoop should not be used to generate predictions.

Enterprise database integrations deprecated

Enterprise database integrations will be unavailable and fully deprecated (end-of-life) starting with release v7.3 (December 13th, 2021 for Cloud users). Post-deprecation, integrations should not be used to generate predictions with deployments.

Open source models deprecated

Open source models have been deprecated.

Customer-reported fixed issues

The following issues have been fixed since release 7.1.3.

Platform

  • EP-1535: Fixes an issue with the map tile management workflow when Minio is used as storage for Docjer-based installs.
  • EP-1495: Sets PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in datarobot scoring to point to DataRobot's python.
  • UIUX-2520: Fixes an issue with the Insights view when the page is refreshed.
  • UIUX-2518: Fixes blueprint task descriptions.
  • UIUX-2510: Fixes the business mode model info view.
  • UIUX-1950: Fixes an issue related to the add/delete column in beginner mode.
  • UIUX-2146: Hides the resource usage summary under each model's Model Info tab by default. Enable the user-level flag to display this information.
  • UIUX-3207: The confusion matrix now displays an error message if an issue occurred while loading matrix data.
  • UIUX-5113: Disables the confusion matrix for multiclass projects that are run with slim-run (no stacked predictions) when the model was trained into validation.

Time Series

  • TIME-8176: Fixes an issue when Prediction Explanations failed to compute with new series modelers.
  • TIME-8425: Anomaly assessment records are now filtered properly when backtest 0 is specified as filtering condition.
  • TIME-8992: Fixes an issue with custom feature lists for KIA new series modelers.
  • TIME-9074: Fixes an issue that caused an error in the computation of a valid forecast point range due to the incorrect minimum number of rows count required to perform the validation.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated May 23, 2024