A monthly record of the new preview and GA features announced for DataRobot's managed AI Platform. Deprecation announcements are also included and link to deprecation guides, as appropriate.
This page provides announcements of newly released features available in DataRobot's SaaS single- and multi-tenant AI Platform, with links to additional resources. From the release center, you can also access:
A monthly record of the feature announcement history
Now generally available, Anthropic Claude 3 Opus brings support for another Claude-family offering to the DataRobot GenAI product. Each model in the family is targeted at specific needs; Claude 3 Opus, the largest model of the Claude family, excels at heavyweight reasoning and complicated tasks. See the full list of LLM availability in DataRobot, with links to creator documentation for assistance in choosing the appropriate model.
Initially released to Workbench in March 2024, multiclass modeling and the associated confusion matrix are now generally available. To support an expansive set of multiclass modeling experiments—classification problems in which the answer has more than two outcomes—DataRobot provides support for an unlimited number of classes using aggregation.
To help gain insights into geospatial patterns in your data, you can now natively ingest common geospatial formats and build enhanced model blueprints with spatially-explicit modeling tasks when building in Workbench. During experiment setup, from Additional settings, select a location feature in the Geospatial insights section and make sure that feature is in the modeling feature list. DataRobot will then create geospatial insights—Accuracy Over Space for supervised projects and Anomaly Over Space for unsupervised.
Personal data detection now GA in SaaS, Self-Managed¶
Because the use of personal data as a modeling feature is forbidden in some regulated use cases, DataRobot Classic provides personal data detection capabilities. The feature is now generally available in both SaaS and self-managed environments. Access the check after uploading data to the AI Catalog.
XEMP Individual Prediction Explanations now in Workbench¶
Workbench now offers two methodologies for computing Individual Prediction Explanations: SHAP (based on Shapley Values) and XEMP (eXemplar-based Explanations of Model Predictions). This insight, regardless of method, helps explain what drives predictions. The XEMP-based explanations are a proprietary method that support all models—they have long been available in DataRobot Classic. In Workbench, they are only available in experiments that don't support SHAP.
Custom tasks now available for Self-Managed users¶
Custom tasks allow you to add custom vertices into a DataRobot blueprint, and then train, evaluate, and deploy that blueprint in the same way as you would for any DataRobot-generated blueprint. With v10.2 the functionality is available via DataRobot Classic and the API for on-premise installations as well.
Manage network policies to limit access to public resources¶
By default, some DataRobot capabilities, including Notebooks, have full public internet access from within the cluster DataRobot is deployed on; however, admins can limit the public resources users can access within DataRobot by setting network access controls. To do so, open User settings > Policies and enable the network policy control toggle. When enabled, users cannot access public resources from within DataRobot.
Monitor EDA resource usage across an organization¶
Now generally available, administrators can monitor the number of configured workers being used for EDA1 and related tasks on the EDA tab of the Resource Monitor. The Resource Monitor provides visibility into DataRobot's active modeling and EDA workers across the installation, providing general information about the current state of the application and specific information about the status of components.
Understand how individual catalog assets relate to other DataRobot entities¶
The AI Catalog serves as a centralized collaboration hub for working with data and related assets in DataRobot. On the Info tab for individual assets, you can now see how other entities in the application are related to—or dependent on—the current asset. This is useful for a number of reasons, allowing you to view how popular an item is based on the number of projects in which it is used, understand which other entities might be affected if you were to make changes or deletions, and gain understanding on how the entity is used.
Automatically remove date features before running Autopilot¶
When setting up a non-time aware project in DataRobot Classic, you can now automatically remove date features from the feature list you want to use to run Autopilot. To do so, open Advanced options for the project, select the Additional tab, and then select Remove date features from selected list and create new modeling feature list. Enabling this parameter duplicates the selected feature list, removes raw date features, and uses the new list to run Autopilot. Excluding raw date features from non-time aware projects can prevent issues like overfitting.
Support for SAP Datasphere connector in DataRobot¶
Available as a premium feature, DataRobot now supports the SAP Datasphere connector, available for preview, in both NextGen and DataRobot Classic.
Feature flag OFF by default: Enable SAP Datasphere Connector (Premium feature)
This release introduces the following EDA insights on the Features tab of the data explore page in Workbench:
Data quality checks appear as indicators on the Features tab of the data explore page as well as insights for individual features.
The Histogram chart displays data quality issues with outliers.
The Frequent Values chart reports inliers, disguised missing values, and excess zeros.
Feature lineage insight for Feature Discovery datasets shows how a feature was generated.
Compliance documentation now available for registered text generation models¶
DataRobot has long provided model development documentation that can be used for regulatory validation of predictive models. Now, the compliance documentation is expanded to include auto-generated documentation for text generation models in the Registy's model directory. For DataRobot natively supported LLMs, the document helps reduce the time spent generating reports, including model overview, informative resources, and most notably model performance and stability tests. For non-natively supported LLMs, the generated document can serve as a template with all necessary sections. Generating compliance documentation for text generation models requires the Enable Compliance Documentation and Enable Gen AI Experimentation feature flags.
Evaluation and moderation for text generation models¶
Evaluation and moderation guardrails help your organization block prompt injection and hateful, toxic, or inappropriate prompts and responses. It can also prevent hallucinations or low-confidence responses and, more generally, keep the model on topic. In addition, these guardrails can safeguard against the sharing of personally identifiable information (PII). Many evaluation and moderation guardrails connect a deployed text generation model (LLM) to a deployed guard model. These guard models make predictions on LLM prompts and responses and then report these predictions and statistics to the central LLM deployment. To use evaluation and moderation guardrails, first, create and deploy guard models to make predictions on an LLM's prompts or responses; for example, a guard model could identify prompt injection or toxic responses. Then, when you create a custom model with the Text Generation target type, define one or more evaluation and moderation guardrails. The GA Premium release of this feature introduces general configuration settings for moderation timeout and evaluation and moderation logs.
Feature flags OFF by default: Enable Moderation Guardrails (Premium feature), Enable Global Models in the Model Registry (Premium feature), Enable Additional Custom Model Output in Prediction Responses
On the Console > Deployments tab, you can now filter on Created by me, Tags, and Model type.
On the Console > Deployments tab, or a deployment's Overview, you can access the updated model replacement workflow from the model actions menu.
Manage custom execution environments in the NextGen Registry¶
The Environments tab is now available in the NextGen Registry, where you can create and manage custom execution environments for your custom models, jobs, applications, and notebooks:
When you enable feature drift tracking for a deployment, you can now customize the features selected for tracking. During or after the deployment process, in the Feature drift section of the deployment settings, choose a feature selection strategy, either allowing DataRobot to automatically select 25 features, or selecting up to 25 features manually.
Calculate insights during custom model registration¶
For custom models with training data assigned, DataRobot now computes model Insights and Prediction Explanation previews during model registration, instead of during model deployment. In addition, new model logs accessible from the model workshop can help you diagnose errors during the Insight computation process.
Associate registered model versions, model deployments, and custom applications to a Use Case with the new Use Case linking functionality. Link these assets to an existing Use Case, create a new Use Case, or manage the list of linked Use Cases.
Add a job, manually or from a template, implementing a code-based retraining policy. To view and add retraining jobs, navigate to the Jobs > Retraining tab, and then:
To add a new retraining job manually, click + Add new retraining job (or the minimized add button when the job panel is open).
To create a retraining job from a template, next to the add button, click , and then, under Retraining, click Create new from template.
A new DataRobot-reserved runtime parameter, CUSTOM_MODEL_WORKERS, is available for custom model configuration. This numeric runtime parameter allows each replica to handle the set number of concurrent processes. This option is intended for process safe custom models, primarily in generative AI use cases.
Custom model process safety
When enabling and configuring CUSTOM_MODEL_WORKERS, ensure that your model is process safe. This configuration option is only intended for process safe custom models, it is not intended for general use with custom models to make them more resource efficient. Only process safe custom models with I/O-bound tasks (like proxy models) benefit from utilizing CPU resources this way.
Now generally available, you can enable port forwarding for notebooks and codespaces to access web applications launched by tools and libraries like MLflow and Streamlit. When developing locally, the web application is accessible at http://localhost:PORT; however, when developing in a hosted DataRobot environment, the port that the web application is running on (in the session container) must be forwarded to access the application. You can expose up to five ports in one notebook or codespace.
GPU support for Notebook and Codespace sessions is now available as a GA Premium feature for managed AI Platform users. When configuring the environment for your DataRobot Notebook or Codespace session, you can select a GPU machine from the list of resource types. DataRobot also provides GPU-optimized built-in environments that you can select from to use for your session. These environment images contain the necessary GPU drivers as well as GPU-accelerated packages like TensorFlow, PyTorch, and RAPIDS.
Now generally available, you can configure the resources and runtime parameters for application sources in the NextGen Registry. The resources bundle determines the maximum amount of memory and CPU that an application can consume to minimize potential environment errors in production. You can create and define runtime parameters used by the custom application by including them in the metadata.yaml file built from the application source.
Build custom applications from the template gallery¶
DataRobot provides templates from which you can build custom applications. These templates allow you to leverage pre-built application front-ends, out of the box, and offer extensive customization options. You can leverage a model that has already been deployed to quickly start and access a Streamlit, Flask, or Slack application. Use a custom application template as a simple method for building and running custom code within DataRobot.
Now generally available, you can leveraging generative AI to create a chat generation Q&A application. Explore Q&A use cases, make business decisions, and showcase business value. The Q&A app offers an intuitive and responsive way to prototype, explore, and share the results of LLM models you've built, including with non-DataRobot users, to expand its usability.
You can also use a code-first workflow to manage the chat generation Q&A application. To access the flow, navigate to DataRobot's GitHub repo. The repo contains a modifiable template for application components.
Incremental learning support for dynamic datasets is now available¶
Support for modeling on dynamic datasets larger than 10GB, for example, data in a Snowflake, BigQuery, or Databricks data source, is now available. When configuring the experiment, set an ordering feature to create a deterministic sample from the dataset and then begin incremental modeling as usual. After model building starts, View experiment info now reports the selected ordering feature.
Feature flags ON by default: Enable incremental learning, Enable dynamic datasets in Workbench, Enable data chunking service
The custom jobs template gallery is now available for the generic, notification, and retraining job types—in addition to custom metric jobs. To access the new template gallery, from the Registry > Jobs tab, create a job from a template for any job type.
Feature flags ON by default: Enable Custom Jobs Template Gallery, Enable Custom Templates
For a deployed binary classification, regression, or multiclass model built with location data in the training dataset, you can now leverage DataRobot Location AI to perform geospatial monitoring on the deployment's Data drift and Accuracy tabs. To enable geospatial analysis for a deployment, enable segmented analysis and define a segment for the location feature geometry, generated during location data ingest. The geometry segment contains the identifier used to segment the world into a grid of H3 cells.
For deployed text generation models, the Monitoring > Data exploration tab includes additional sort and filter options on the Tracing table, providing new ways to interact with a Generative AI deployment's stored prompt and response data and gain insight into a model's performance through the configured custom metrics. In addition, this release introduces custom metric templates for Cosine Similarity and Euclidean Distance.
Feature flags OFF by default: Enable Data Quality Table for Text Generation Target Types (Premium feature), Enable Actuals Storage for Generative Models (Premium feature)
Feature flags ON by default: Enable Custom Jobs Template Gallery, Enable Custom Templates
Editable resource settings and runtime parameters for deployments¶
For deployed custom models, the custom model CPU (or GPU) resource bundle and runtime parameters defined during custom model assembly are now editable after assembly.
If the custom model is deployed on a DataRobot Serverless prediction environment and the deployment is inactive, you can modify the Resource bundle settings from the Resources tab.
Use a deployment's Predictions > Make predictions tab to make batch predictions on a recipe wrangled from the Data Registry. Batch predictions are a method of making predictions with large datasets, in which you pass input data and get predictions for each row. In the Prediction dataset box, click Choose file > Wrangler recipe, then pick a recipe from the Data Registry:
Predictions in Workbench
Batch predictions on recipes wrangled from the Data Registry are also available in Workbench. To make predictions with a model before deployment , select the model from the Models list in an experiment and then click Model actions > Make predictions.
You can also schedule batch prediction jobs by specifying the prediction data source and destination and determining when DataRobot runs the predictions.
Use the declarative API to provision DataRobot assets¶
You can use the DataRobot declarative API as a code-first method for provisioning resources end-to-end in a way that is both repeatable and scalable. Supporting both Terraform and Pulumi, you can use the declarative API to programmatically provision DataRobot entities such as models, deployments, applications, and more. The declarative API allows you to:
Specify the desired end state of infrastructure, simplifying management and enhancing adaptability across cloud providers.
Automate the provisioning of DataRobot assets to ensure consistency across environments and alleviate concerns about execution order. Terraform and Pulumi allow you to provision in two phases: planning and application. You can view a plan that outlines what resources are created before committing to provisioning actions, and then resolve any infrastructure dependencies on your behalf when a change is made. Then, you can execute the provisioning separately. This makes provisioning easier to manage within a complex infrastructure. You can preview the impacts that changes will have to DataRobot assets downstream in the workflow.
Simplify version control.
Use application templates to reduce workflow duplication and ensure consistency.
Integrate with DevOps and CI/CD to ensure predictable, consistent infrastructure and reduce deployment risks.
Review an example below of how you can use the declarative API to provision DataRobot resources using the Pulumi CLI: