With the latest deployment, DataRobot's AI Platform delivered the new GA and preview features listed below. From the release center you can also access:
You can now publish wrangling recipes to materialize data in DataRobot’s Data Registry or Snowflake. When you publish a wrangling recipe, operations are pushed down into a Snowflake virtual warehouse, allowing you to leverage the security, compliance, and financial controls of Snowflake. By default, the output dataset is materialized in DataRobot's Data Registry. Now you can materialize the wrangled dataset in Snowflake databases and schemas for which you have write access.
Feature flags: Enable Snowflake In-Source Materialization in Workbench, Enable Dynamic Datasets in Workbench
Perform joins and aggregations on your data in Workbench¶
You can now add Join and Aggregation operations to your wrangling recipe in Workbench. Use the Join operation to combine datasets that are accessible via the same connection instance, and the Aggregation operation to apply aggregation functions like sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations to features in your dataset.
When publishing a wrangling recipe in Workbench, use smart downsampling to reduce the size of your output dataset and optimize model training. Smart downsampling is a data science technique to reduce the time it takes to fit a model without sacrificing accuracy. This downsampling technique accounts for class imbalance by stratifying the sample by class. In most cases, the entire minority class is preserved and sampling only applies to the majority class, which is particularly useful for imbalanced data. Because accuracy is typically more important on the minority class, this technique greatly reduces the size of the training dataset, reducing modeling time and cost while preserving model accuracy.
Feature flag: Enable Smart Downsampling in Wrangle Publishing Settings
A new BigQuery connector is now available for preview, providing several performance and compatibility enhancements, as well as support for authentication using Service Account credentials.
In this release, the sklearn library was upgraded from 0.15.1 to 0.24.2. The impacts are summarized as follows:
Feature association insights: Updated the spectral clustering logic. This only affects the cluster ID (a numeric identifier for each cluster, e.g., 0, 1, 2, 3). The values of feature association insights are not affected.
AUC/ROC insights: Due to the improvement in sklearn ROC curve calculation, the precision of AUC/ROC values are slightly affected.
You can now tune hyperparameters for custom tasks. You can provide two values for each hyperparameter: the name and type. The type can be one of int, float, string, select, or multi, and all types support a default value. See Model metadata and validation schema for more details and example configuration of hyperparameters.
MLflow integration for the DataRobot Model Registry¶
The preview release of the MLflow integration for DataRobot allows you to export a model from MLflow and import it into the DataRobot Model Registry, creating key values from the training parameters, metrics, tags, and artifacts in the MLflow model. You can use the integration's command line interface to carry out the export and import processes:
Now available for preview, monitoring job definitions allow DataRobot to pull calculated custom metric values from outside of DataRobot into the custom metric defined on the Custom Metrics tab, supporting custom metrics with external data sources. For example, you can create a monitoring job to connect to Snowflake, fetch custom metric data from the relevant Snowflake table, and send the data to DataRobot:
Timeliness indicators for predictions and actuals¶
Deployments have several statuses to define the general health of a deployment, including Service Health, Data Drift, and Accuracy. These statuses are calculated based on the most recent available data. For deployments relying on batch predictions made in intervals greater than 24 hours, this method can result in an unknown status value on the Prediction Health indicators in the deployment inventory. Now available for preview, those deployment health indicators can retain the most recently calculated health status, presented along with timeliness status indicators to reveal when they are based on old data. You can determine the appropriate timeliness intervals for your deployments on a case-by-case basis. Once you've enabled timeliness tracking on a deployment's Usage > Settings tab, you can view timeliness indicators on the Usage tab and in the Deployments inventory:
View the Predictions Timeliness and Actuals Timeliness columns:
View the Predictions Timeliness and Actuals Timeliness tiles:
Along with the status, you can view the Updated time for each timeliness tile.
Note
In addition to the indicators on the Usage tab and the Deployments inventory, when a timeliness status changes to Red / Failing, a notification is sent through email or the channel configured in your notification policies.
The Model Registry is an organizational hub for various models used in DataRobot, where you can access models as deployment-ready model packages. Now available as a preview feature, the Model Registry > Registered Models page provides an additional layer of organization to your models.
On this page, you can group model packages into registered models, allowing you to categorize them based on the business problem they solve. Registered models can contain:
DataRobot, custom, and external models
Challenger models (alongside the champion)
Automatically retrained models.
Once you add registered models, you can search, filter, and sort them. You can also share your registered models (and the versions they contain) with other users.
For more information, see the Model Registry documentation.
Feature flag: Enable Versioning Support in the Model Registry
Now available as a preview feature, you can enable full network access for any custom model. When you create a custom model, you can access any fully qualified domain name (FQDN) in a public network so that the model can leverage third-party services. Alternatively, you can disable public network access if you want to isolate a model from the network and block outgoing traffic to enhance the security of the model. To review this access setting for your custom models, on the Assemble tab, under Resource Settings, check the Network access:
Now available as a preview feature, the text generation target type for DataRobot custom and external models is compatible with generative Large Language Models (LLMs), allowing you to deploy generative models, make predictions, monitor service, usage, and data drift statistics, and create custom metrics. DataRobot supports LLMs through two deployment methods:
Monitor a text generation model running externally: Create and deploy a text generation model on your infrastructure (local or cloud), using the monitoring agent to communicate the input and output of your LLM to DataRobot for monitoring.
The Organization Administrator route for removing users from an organization, DELETE /api/v2/organizations/(organizationId)/users/(userId)/ has been removed. Instead, they should be deactivated, or a system administrator can move the user to a different organization.
Adds the useGpu option/parameter. When GPU workers are enabled, this option controls whether the project should use GPU workers. The parameter is added to the following route:
PATCH /api/v2/projects/(projectId)/aim/
The useGpu option/parameter will also be returned as a new field when project data is retrieved using route:
GET /api/v2/projects/(projectId)/
The new optional parameters modelBaselines, modelRegimeId, modelGroupId for OTV Time Series projects without FEAR are added to: PATCH /api/v2/projects/(projectId)/aim/. To use these fields, enable the feature flag Forecasting Without Automated Feature Derivation.
The following route to register a Leaderboard model is deprecated in favor of POST /api/v2/modelPackages/fromLeaderboard/ and will be removed in v2.33:
POST /api/v2/modelPackages/fromLearningModel/
The following use case manage endpoints are deprecated in favor of new GET /api/v2/valueTrackers/ based endpoints and will be removed in v2.33:
GET /api/v2/useCases/(useCaseId)/attachments/(attachmentId)/
GET /api/v2/useCases/(useCaseId)/realizedValueOverTime/
GET /api/v2/useCases/(useCaseId)/sharedRoles/
PATCH /api/v2/useCases/(useCaseId)/sharedRoles/
Current useCases/ endpoints are being renamed to valueTracker/ endpoints. Current useCases/ endpoints will sunset in two releases, API 2.33. In place of the current useCases/ endpoints, please begin using the valueTrackers/ endpoints.
Version v2.31 of the R client is now available for preview. It can be installed via GitHub.
This version of the R client addresses an issue where a new feature in the curl==5.0.1 package caused any invocation of datarobot:::UploadData (i.e., SetupProject) to fail with the error No method asJSON S3 class: form_file.
The R client will now output a warning when you attempt to access certain resources (projects, models, deployments, etc.) that are deprecated or disabled by the DataRobot platform migration to Python 3.
Added support for comprehensive autopilot: use mode = AutopilotMode.Comprehensive.
Fixed an issue where an undocumented feature in curl==5.0.1 is installed that caused any invocation of datarobot:::UploadData (i.e., SetupProject) to fail with the error No method asJSON S3 class: form_file.
Loading the datarobot package with suppressPackageStartupMessages() will now suppress all messages.
The functions ListProjects and as.data.frame.projectSummaryList no longer return fields related to recommender models, which were removed in v2.5.0.
The function SetTarget now sets autopilot mode to Quick by default. Additionally, when Quick is passed, the underlying /aim endpoint will no longer be invoked with Auto.
Updated the "Introduction to DataRobot" vignette to use Ames, Iowa housing data instead of the Boston housing dataset.
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.