Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

AutoML (V7.0)

March 15, 2021

The DataRobot v7.0.0 release includes many new UI and API capabilities, described below. See also details on time series new features for more details.

See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality.

Release v7.0.0 provides updated UI string translations for the following languages:

  • Japanese
  • French
  • Spanish

In the spotlight...

The following features are some of the highlights of Release 7.0:

Now GA: Bias detection and analysis tools

Bias and Fairness testing, now publicly available, provides methods to calculate fairness for a binary classification model and to identify any biases in the model’s predictive behavior.

Before model building, use Advanced Options > Bias and Fairness to define protected features and choose the appropriate fairness metric for your use case. A Help me choose questionnaire prompts DataRobot to recommend a metric. Once models are built, Bias and Fairness insights help identify bias in a model and visualize results of root-cause analysis into why the model is learning bias from the training data and from where.

  • Per-Class Bias uses the fairness threshold and fairness score of each class to determine if certain classes are experiencing bias in the model’s predictive behavior.

  • Cross-Class Data Disparity performs root-cause analysis of the model’s bias for the selected classes. The Data Disparity vs Feature Importance chart identifies which features impact bias most; the Feature details chart reports where bias exists within the feature.

  • Cross-Class Accuracy helps to understand how the model is performing and its behavior on a given protected feature/class segment.

Now GA: Accuracy-boosting train-time image augmentation

Train time image augmentation, a feature available for Visual AI projects, boosts accuracy on image datasets, especially those with few rows. More data usually means better accuracy and better generalization, but often you don’t have the resources (time, money, image availability, labeling expertise, etc.) to easily obtain it. With image augmentation you can create new image data from existing images by applying transformations.

You can create image transformations prior to model-building via Advanced options. Or, after model building completes, you can continue to tune the image dataset from the Leaderboard's Evaluate > Advanced Tuning tab. A new "Image Augmentation" task will appear in image blueprints. Improvements to augmentation, based on Beta feedback, include support for multimodal projects, an increase in the size of augmentation that DataRobot can perform, and an improved UI for previewing augmentation strategies. Also, post-modeling tuning and new augmentation list creation has moved to Advanced Tuning.

New features and enhancements

Feature Discovery enhancements

See details of Feature Discovery enhancements below:

Other new features

See details of other new features below:

Changes for Self-Managed Administrators

New Feature Discovery features

Increased blueprint support of summarized categorical features increases accuracy and Leaderboard diversity

The summarized categorical variable type are for features that host a collection of categories (for example, the count of a product by category or department). If your original dataset does not have features of this type, DataRobot creates them (from secondary datasets) as part of the feature discovery process. With this release, DataRobot adds support for this feature type to a wider selection of blueprints, resulting in a greater number of models being run during Autopilot. This addition will be particularly impactful in Feature Discovery projects with secondary datasets.

Summarized categorical insights now filter stop words

With this release, insights for summarized categorical features now filter out stop words on demand (Category Cloud) and by default (Histogram) for single-token text. Removing stop words—commonly used terms that can be excluded from searches—improves interpretability if the words are not informative to the model. This is because, by filtering, users can focus on the important non-stopwords to better understand their data.

Beta: Feature Discovery now available for unsupervised projects

Previously, Feature Discovery did not support unsupervised learning projects. While the option was visible at project start when "No Target" was chosen, the UI returned an error message if you tried to configure Feature Discovery settings. Now available as a beta feature, you can set unsupervised mode, add secondary datasets, define relationships, and start a project. DataRobot will generate secondary features as in a supervised project, while eliminating supervised feature reduction (which requires a target).

Beta: Feature Discovery deployments support governance workflow to manage secondary datasets (MLOps required)

With this release, you can manage updates to secondary datasets in Feature Discovery deployments using the governance workflow. After an admin sets up the “Secondary dataset configuration changed” approval policy trigger in User Settings > Approval Policies, any changes to a secondary dataset will prompt a change request that must go through an approval process. The creator of the change request can view its status under History in Deployments > Overview, and reviewers will see a notification requesting that they review pending changes.

Beta: Support for Spark SQL queries in dynamic datasets now available in Feature Discovery secondary datasets

DataRobot offers the ability to enrich, transform, shape, and blend together snapshotted (static) datasets using Spark SQL queries from within the AI Catalog. This new functionality adds support for dynamic Spark SQL in secondary datasets for Feature Discovery projects. When enabled as a beta feature ("Enable Feature Discovery Support of Dynamic Spark SQL"), this new functionality increases flexibility in performing basic data prep. Authentication requirements remain the same.

Other new features

Prediction threshold gets a UX upgrade

With this release, DataRobot has upgraded the user experience for setting prediction thresholds on the Leaderboard. First, upgrades to the components on the ROC Curve, Profit Curve, Make Predictions, and Deploy tabs make assigning or selecting a suggested prediction threshold easier. Next, there is now a convenient one-click copy between the display threshold and the prediction threshold on the ROC Curve and Profit Curve tabs. Finally, the selected prediction threshold is now synched across all tabs in a model and for model downloads (such as a model package (.mlpkg) file).

Access additional Scoring Code models

In 7.0, Scoring Code coverage has increased. The following models have been rewritten to include Scoring Code:

Developer Tools page now provides access to R and Python clients

New in this release, Developer Tools now provides quick links to developer documentation. These include links to:

Multiclass Feature Impact now supports custom sample sizes

Multiclass projects are now able to compute Feature Impact using custom sample size. Address inconsistencies in Feature Impact results, and reproduce those results in a much more consistent way, thus reducing friction during the model validation process.

Beta: Multilabel classification capabilities expands classification options

Multilabel modeling, now available as a public beta feature, is a kind of classification task where each data instance (row in a dataset) is associated with none, one, or several labels. Common uses are for text features with a list of topics (food, Boston, Italian) or images with a list of objects in it (a cat, two dogs, a bear). All the labels for a row builds a label set for the row. Multilabel classification then predicts label sets given new observations. While similar to multiclass modeling, multilabel modeling provides more flexibility.

Data type Description Allowed as target? Project type
Categorical Single category per row, mutually exclusive Yes Multiclass
Multicategorical Multiple categories per row, non-exclusive Yes Multilabel
Summarized categorical Multiple categories per row, multiple instances of each category allowed No Multiregression (not yet available)

Beta: New TinyBERT pre-trained featurizer implementation extends NLP with no fine-tuning needed

BERT (Bidirectional Encoder Representations from Transformers) is Google's transformer-based de-facto standard for natural language processing (NLP) transfer learning. TinyBERT (or any distilled, smaller, version of BERT) is now available with certain blueprints in the DataRobot Repository. These blueprints provide pre-trained feature extraction in the NLP field, similar to Visual AI featurizers. However, for maximum flexibility, DataRobot's implementation offers two additional tunable pooling parameters—Max Pooling and Average Pooling. TinyBERT blueprints are available for both UI and API users.

Beta: Scoring Code support for Keras models

Now publicly available, Keras models have been rewritten to include Scoring Code.

New admin features

Enhanced SAML SSO provides additional configuration options through the UI

Self-Managed only: Enhanced SAML SSO, the new SSO configuration option, allows administrators to provision user roles, update user details (first/last name), and determine who has access to DataRobot. Using their existing identity provider (IdP)/SSO solution—Active Directory, OneLogin, Okta, for example—users can now seamlessly access DataRobot as long as they are logged in to their organization's IdP system. In addition, enhanced SSO supports flexibility with IdP metadata parameterization, including security parameters, SAML secrets, and user attribute, role, and groups mappings.

Enhanced SAML SSO replaces SAML SSO, which will be deprecated in an upcoming release.

API enhancements

The following is a summary of API enhancements. See the changelog for more details and fixed issues.

New features

  • The lists of allowed and forbidden operations over DataStores and DataSources are now provided by new routes.

  • A new field canDelete, has been added to the response of the GET /api/v2/externalDataSources/ route, which lists all viewable data sources.

Enhancements

  • Models can be retrained with custom monotonic constraints.

  • Models can be retrained with cross validation.

  • Creating a datetime model using POST /api/v2/projects/(projectId)/datetimeModels/ without specifying a featurelist will result in using the recommended featurelist for the specified blueprint. If there is no recommended featurelist, the project’s default featurelist will be used instead.

  • The new string field parameter unsupervisedType has been added to two endpoints to set the type of unsupervised project as anomaly or clustering when a project is run in unsupervisedMode.

  • A new field, canUseDatasetData, indicates whether a user can use dataset data for download, project creation, custom models training, or providing predictions.

Tip

DataRobot highly recommends updating to the latest API client for Python and R.

Deprecation notices

Scaleout models deprecated

Scaleout models will be deprecated in a future release and should not be used to train new models.

Customer-reported issues fixed in v7.0.0

The following issues have been fixed since release v6.3.4.

Platform

  • DM-4525: Data Connections are now properly listed in Credentials Management Page when the UI language is set to non-English.

  • DM-4637: Adds a new config setting, KERBEROS_PEM_ENABLE, which when set to True will allow the kinit command to use a service ticket using PKINIT preauth instead of using a keytab.

  • DM-4696: The following variables have changed:

    • AZURE_BLOB_STORAGE_CHUNK_SIZE env variable is configurable (99MB default).
    • AZURE_BLOB_STORAGE_TIMEOUT env variable is configurable (20 second default).
  • EP-506: Fixes an issue with database timeout during index create/update.

Platform

  • EP-750: Fixes an issue with systems using external directory services where some DataRobot containers were unable to resolve the datarobot_user user. This change introduces the os_configuration.remote_user_credentials parameter by mapping the external directory service credentials into DataRobot containers when set to true.
  • EP-795: For third-party tools, the admin interface for RabbitMQ now can have additional headers.

  • PLT-3052: Fixed LDAP group mapping for groups with special symbols in the name.

Modeling

  • MODEL-5033: Modified certain Keras Repository blueprints that make use of One Hot Encoding numerics so that they perform NDC before One Hot Encoding. This fix ensures prediction consistency between the ModelingAPI and BatchAPI.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Updated May 23, 2024