September 13, 2021
In the spotlight...¶
The following features are some of the highlights of Release 7.2:
- Purpose-built AI applications with the AI App Builder
- Public preview: DataRobot Pipelines
- Public preview: External prediction insights
- Public preview: Bias and Fairness monitoring for deployments
User interface enhancements¶
New login experience¶
This release introduces a new login experience for DataRobot platform application users. The new page is redesigned to convey the level of innovation and technical revolution this company and product are offering without affecting the existing log in workflow.
ROC Curve redesign¶
The ROC Curve tab has been redesigned to streamline the model evaluation strategies you can perform. Along with the Prediction Distribution graph, ROC curve, confusion matrix, and a summary of metrics, you can now generate profit curves, precision-recall curves, and custom charts in the ROC Curve tab.
For details, see ROC Curve.
New location for tools to share and edit project names¶
To improve navigation, this release brings a new home for the project sharing and project name editing tools. While still available from the project control center (Manage Projects), you can now more quickly access the tools directly from the project dropdown.
New Spark version for improved performance¶
Release 7.2 upgrades the Spark version used for Feature Discovery and Spark SQL to Spark 3.0. In addition to Spark performance improvements, the upgrade brings improved JDBC compatibility with the AI Catalog (which uses Java 11) and a smaller shippable codebase. DataRobot now supports all drivers that are compatible with any Java version 8 or later.
Connect to Snowflake and Google BigQuery using OAuth¶
Snowflake and Google BigQuery users can now set up a data connection using OAuth single sign-on. Once configured, you can read data from production databases to use for model building and predictions. For details, see Data connection with OAuth.
Feature Discovery features¶
Feature Discovery Relationship Editor setup guide¶
With Feature Discovery, DataRobot generates new features from multiple datasets so that you don’t need to perform feature engineering manually to consolidate multiple datasets. Use the Relationship Editor to join the datasets to prepare for Feature Discovery.
The Relationship Editor setup guide is a new intermediate screen that displays when you click the Add datasets button on the EDA (Data) page. It walks you through the process of specifying prediction points for time-aware features and adding the datasets to be joined for Feature Discovery. For details, see Create a Feature Discovery project.
Feature Discovery engineering controls¶
Feature Discovery engineering controls, now publicly available, let you influence how DataRobot conducts feature engineering.
You can enable specific controls to use your domain knowledge to guide feature engineering or to improve accuracy. You might want to exclude specific transformations that slow down processing or are difficult to explain to stakeholders. For details, see Set feature engineering controls.
Feature Discovery settings enhanced¶
The Feature Discovery tab on the Data page provides dataset relationship details, a feature derivation summary, and a feature derivation log. You can now see the number of secondary datasets, explored features, and derived features that resulted from Feature Discovery. Click Show more to see which feature engineering controls were used during Feature Discovery and to learn about each.
For details, see Define relationships.
Categorical Statistics feature type¶
Categorical Statistics let you explore numeric statistics like sum, max, and average for each category of a categorical feature. In the following example, during Feature Discovery, DataRobot explores Spending numeric statistics for each category of the Product-Type feature:
- Spending(30 days min)
- Spending(30 days min by Product_Type = A)
- Spending(30 days min by Product_Type = B)
- Spending(30 days min by Product_Type = C) ..
Categorical Statistics aggregation is turned off by default. You can enable it on the Feature Engineering tab of the Feature Discovery Settings page. For details, see Categorical Statistics.
Purpose-built AI applications with the AI App Builder¶
The AI App Builder, available from the Applications tab, provides a no-code platform to enable core DataRobot services (making predictions, optimizing outcomes, simulating scenarios, and more) without having to build models and evaluate their performance in DataRobot.
Each application starts with a template and data source—either a deployment or dataset in the AI Catalog. However, the App Builder lets you configure additional widgets, custom features, and pages to tailor the application to a specific use case.
Once deployed, applications can be easily shared and do not require users to own full DataRobot licenses in order to use them, offering a great solution for broadening your organization’s ability to use DataRobot’s functionality.
Applications are composed of widgets that create visual, interactive, and purpose-driven end-user applications. There are two types of widgets—chart widgets and header widgets. Chart widgets add visualizations to an application and can be configured to surface important insights in your data and prediction results. Header widgets provide additional filtering options for your application.
What-if and Optimizer widget¶
The What-if and Optimizer widget provides two tools for interacting with prediction results:
- What-if: A decision-support tool that allows you to create and compare multiple prediction simulations to identify the option that provides the best outcome. You can also make predictions, then change one or more inputs to create a new simulation, and see how those changes affect the target feature.
- Optimizer: Identifies the maximum or minimum predicted value for a target by varying the values of a selection of flexible features in the model.
Word Cloud blueprints for multiclass projects¶
An improvement has been made so that all Stochastic Gradient Descent (SGD) blueprints create a Word Cloud if even a single text feature is present in a multiclass project. Previously, there was a specialized SGD blueprint, available from the Repository, that had to be run manually. Access the new visualizations from either the model’s Describe > Word Cloud or Insights > Word Cloud tabs.
New Keras DeepCTR models available in the Repository¶
To support data scientists with CTR data (categoricals with high cardinality), DataRobot introduces three DeepCTR models, available from the Repository. These models—neural factorization machine, autoint, and deep cross network—can be particularly useful when building clickthrough rate or recommendation models.
Bias and Fairness improvements¶
With this release, DataRobot has upgraded the user experience for calculating Bias and Fairness for your models. The first improvement allows you to enable Bias and Fairness insights after modeling has already started. Select a model and navigate to Bias and Fairness > Settings. Once configured, Bias and Fairness insights are enabled for every model on the Leaderboard.
The second improvement is the ability to view multiple fairness metrics in the Per-Class Bias page. This functionality allows you to view fairness scores for all five fairness metrics using a dropdown menu.
For details, see the Bias and Fairness documentation.
TLS options for Portable Prediction Server¶
By default, the Portable Prediction Server (PPS) serves predictions over an insecure listener on an :8080 port (clear text HTTP over TCP). You can now also serve predictions over a secure listener on :8443 port (HTTP over TLS/SSL, or simply HTTPS). When the secure listener is enabled, the insecure listener becomes unavailable. The configuration is accomplished using environment variables, which are described in the documentation along with accompanying examples.
Public preview features¶
DataRobot Pipelines enable data science and engineering teams to manage machine learning data throughout the stages of model development and deployment. To make this process easier, teams build data pipelines, sets of connected data processing steps, so that they can train models with new data as needed.
A data flow pipeline is a collection of connected modules. It contains the module specifications, connections, and configurations needed to implement your machine learning data flows. You build and execute pipelines in workspaces that you create by selecting the Workspaces tab in the AI Catalog.
Pipelines contain input (CSV Reader), transformation (Spark SQL), and output (AI Catalog Export and CSV Writer) modules. Once built, you can run your pipelines interactively or you can schedule batch runs.
Create feature lists in the Relationship Editor¶
The ability to create feature lists in the Feature Discovery Relationship Editor is now available as a public preview feature.
Once you create your feature list, you can transform the features directly in the Relationship Editor.
Further enhancements to multilabel modeling¶
In response to user feedback, the multilabel modeling public preview feature introduces several usability improvements, including:
- Addition of Feature Effects visualization for multilabel projects
- Increased speed of per-label metric
- New per-label Word Clouds
- Ability to easily pin labels
- Model packages and access to the Portable Prediction Server for MLOps
Insights available for external models¶
Through the External Predictions advanced option tab, you can bring external model(s) into the DataRobot AutoML environment, view them on the Leaderboard, and run a subset of DataRobot's evaluative insights for comparison against DataRobot models. Simply add external model predictions as a new column in your training dataset, identify the predictions and partition column, and press Start. The external model becomes available on the Leaderboard, which you can then compare against DataRobot models, investigate further using select DataRobot visualizations, and (for binary classification projects) explore bias testing.
Note the following to better plan for later migration to new releases.
Hadoop deployment and scoring deprecated¶
Hadoop deployment and scoring, including the Standalone Scoring Engine (SSE), will be unavailable and fully deprecated (end-of-life) starting with release v7.3 (December 13th, 2021 for Cloud users). Post-deprecation, Hadoop should not be used to generate predictions.
Enterprise database integrations deprecated¶
Enterprise database integrations will be unavailable and fully deprecated (end-of-life) starting with release v7.3 (December 13th, 2021 for Cloud users). Post-deprecation, integrations should not be used to generate predictions with deployments.
Open source models deprecated¶
Open source models have been deprecated.
Customer-reported fixed issues¶
The following issues have been fixed since release 7.1.3.
- EP-1535: Fixes an issue with the map tile management workflow when Minio is used as storage for Docjer-based installs.
- EP-1495: Sets
PYSPARK_DRIVER_PYTHONin datarobot scoring to point to DataRobot's python.
- UIUX-2520: Fixes an issue with the Insights view when the page is refreshed.
- UIUX-2518: Fixes blueprint task descriptions.
- UIUX-2510: Fixes the business mode model info view.
- UIUX-1950: Fixes an issue related to the add/delete column in beginner mode.
- UIUX-2146: Hides the resource usage summary under each model's Model Info tab by default. Enable the user-level flag to display this information.
- UIUX-3207: The confusion matrix now displays an error message if an issue occurred while loading matrix data.
- UIUX-5113: Disables the confusion matrix for multiclass projects that are run with slim-run (no stacked predictions) when the model was trained into validation.
- TIME-8176: Fixes an issue when Prediction Explanations failed to compute with new series modelers.
- TIME-8425: Anomaly assessment records are now filtered properly when backtest 0 is specified as filtering condition.
- TIME-8992: Fixes an issue with custom feature lists for KIA new series modelers.
- TIME-9074: Fixes an issue that caused an error in the computation of a valid forecast point range due to the incorrect minimum number of rows count required to perform the validation.