Now generally available, Anthropic Claude 3 Opus brings support for another Claude-family offering to the DataRobot GenAI product. Each model in the family is targeted at specific needs; Claude 3 Opus, the largest model of the Claude family, excels at heavyweight reasoning and complicated tasks. See the full list of LLM availability in DataRobot, with links to creator documentation for assistance in choosing the appropriate model.
Initially released to Workbench in March 2024, multiclass modeling and the associated confusion matrix are now generally available. To support an expansive set of multiclass modeling experiments—classification problems in which the answer has more than two outcomes—DataRobot provides support for an unlimited number of classes using aggregation.
To help gain insights into geospatial patterns in your data, you can now natively ingest common geospatial formats and build enhanced model blueprints with spatially-explicit modeling tasks when building in Workbench. During experiment setup, from Additional settings, select a location feature in the Geospatial insights section and make sure that feature is in the modeling feature list. DataRobot will then create geospatial insights—Accuracy Over Space for supervised projects and Anomaly Over Space for unsupervised.
Personal data detection now GA in SaaS, Self-Managed¶
Because the use of personal data as a modeling feature is forbidden in some regulated use cases, DataRobot Classic provides personal data detection capabilities. The feature is now generally available in both SaaS and self-managed environments. Access the check after uploading data to the AI Catalog.
XEMP Individual Prediction Explanations now in Workbench¶
Workbench now offers two methodologies for computing Individual Prediction Explanations: SHAP (based on Shapley Values) and XEMP (eXemplar-based Explanations of Model Predictions). This insight, regardless of method, helps explain what drives predictions. The XEMP-based explanations are a proprietary method that support all models—they have long been available in DataRobot Classic. In Workbench, they are only available in experiments that don't support SHAP.
Custom tasks now available for Self-Managed users¶
Custom tasks allow you to add custom vertices into a DataRobot blueprint, and then train, evaluate, and deploy that blueprint in the same way as you would for any DataRobot-generated blueprint. With v10.2 the functionality is available via DataRobot Classic and the API for on-premise installations as well.
By default, some DataRobot capabilities, including Notebooks, have full public internet access from within the cluster DataRobot is deployed on; however, admins can limit the public resources users can access within DataRobot by setting network access controls. To do so, open User settings > Policies and enable the network policy control toggle. When enabled, users cannot access public resources from within DataRobot.
Monitor EDA resource usage across an organization¶
Now generally available, administrators can monitor the number of configured workers being used for EDA1 and related tasks on the EDA tab of the Resource Monitor. The Resource Monitor provides visibility into DataRobot's active modeling and EDA workers across the installation, providing general information about the current state of the application and specific information about the status of components.
Understand how individual catalog assets relate to other DataRobot entities¶
AIカタログとは、データおよび関連アセットを操作するために一元化されたコラボレーションハブです。 On the Info tab for individual assets, you can now see how other entities in the application are related to—or dependent on—the current asset. This is useful for a number of reasons, allowing you to view how popular an item is based on the number of projects in which it is used, understand which other entities might be affected if you were to make changes or deletions, and gain understanding on how the entity is used.
Automatically remove date features before running Autopilot¶
When setting up a non-time aware project in DataRobot Classic, you can now automatically remove date features from the feature list you want to use to run Autopilot. To do so, open Advanced options for the project, select the Additional tab, and then select Remove date features from selected list and create new modeling feature list. Enabling this parameter duplicates the selected feature list, removes raw date features, and uses the new list to run Autopilot. Excluding raw date features from non-time aware projects can prevent issues like overfitting.
This release introduces the following EDA insights on the Features tab of the data explore page in Workbench:
Data quality checks appear as indicators on the Features tab of the data explore page as well as insights for individual features.
The Histogram chart displays data quality issues with outliers.
The Frequent Values chart reports inliers, disguised missing values, and excess zeros.
Feature lineage insight for Feature Discovery datasets shows how a feature was generated.
Compliance documentation now available for registered text generation models¶
DataRobot has long provided model development documentation that can be used for regulatory validation of predictive models. Now, the compliance documentation is expanded to include auto-generated documentation for text generation models in the Registy's model directory. For DataRobot natively supported LLMs, the document helps reduce the time spent generating reports, including model overview, informative resources, and most notably model performance and stability tests. For non-natively supported LLMs, the generated document can serve as a template with all necessary sections. Generating compliance documentation for text generation models requires the Enable Compliance Documentation and Enable Gen AI Experimentation feature flags.
評価とモデレーションのガードレールは、組織がプロンプトインジェクションや、悪意のある、有害な、または不適切なプロンプトや回答をブロックするのに役立ちます。 また、ハルシネーションや信頼性の低い回答を防ぎ、より一般的には、モデルをトピックに沿った状態に保つこともできます。 さらに、これらのガードレールは、個人を特定できる情報(PII)の共有を防ぐことができます。 多くの評価およびモデレーションガードレールは、デプロイされたテキスト生成モデル(LLM)をデプロイされたガードモデルに接続します。 これらのガードモデルはLLMのプロンプトと回答について予測し、これらの予測と統計を中心的なLLMデプロイに報告します。 評価とモデレーションのガードレールを使用するには、まず、LLMのプロンプトや回答について予測するガードモデルを作成してデプロイします。たとえば、ガードモデルは、プロンプトインジェクションや有害な回答を識別することができます。 次に、ターゲットタイプがテキスト生成のカスタムモデルを作成する場合、評価とモデレーションのガードレールを1つ以上定義します。 The GA Premium release of this feature introduces general configuration settings for moderation timeout and evaluation and moderation logs.
When you enable feature drift tracking for a deployment, you can now customize the features selected for tracking. During or after the deployment process, in the Feature drift section of the deployment settings, choose a feature selection strategy, either allowing DataRobot to automatically select 25 features, or selecting up to 25 features manually.
Calculate insights during custom model registration¶
For custom models with training data assigned, DataRobot now computes model Insights and Prediction Explanation previews during model registration, instead of during model deployment. In addition, new model logs accessible from the model workshop can help you diagnose errors during the Insight computation process.
Associate registered model versions, model deployments, and custom applications to a Use Case with the new Use Case linking functionality. Link these assets to an existing Use Case, create a new Use Case, or manage the list of linked Use Cases.
Add a job, manually or from a template, implementing a code-based retraining policy. To view and add retraining jobs, navigate to the Jobs > Retraining tab, and then:
To add a new retraining job manually, click + Add new retraining job (or the minimized add button when the job panel is open).
To create a retraining job from a template, next to the add button, click , and then, under Retraining, click Create new from template.
A new DataRobot-reserved runtime parameter, CUSTOM_MODEL_WORKERS, is available for custom model configuration. This numeric runtime parameter allows each replica to handle the set number of concurrent processes. This option is intended for process safe custom models, primarily in generative AI use cases.
Custom model process safety
When enabling and configuring CUSTOM_MODEL_WORKERS, ensure that your model is process safe. This configuration option is only intended for process safe custom models, it is not intended for general use with custom models to make them more resource efficient. Only process safe custom models with I/O-bound tasks (like proxy models) benefit from utilizing CPU resources this way.
Now generally available, you can enable port forwarding for notebooks and codespaces to access web applications launched by tools and libraries like MLflow and Streamlit. ローカルで開発する場合、Webアプリケーションはhttp://localhost:PORTでアクセスできます。しかし、ホストされたDataRobot環境で開発する場合、Webアプリケーションにアクセスするには、そのアプリケーションが実行されている(セッションコンテナ内の)ポートを転送する必要があります。 You can expose up to five ports in one notebook or codespace.
GPU support for Notebook and Codespace sessions is now available as a GA Premium feature for managed AI Platform users. When configuring the environment for your DataRobot Notebook or Codespace session, you can select a GPU machine from the list of resource types. DataRobot also provides GPU-optimized built-in environments that you can select from to use for your session. These environment images contain the necessary GPU drivers as well as GPU-accelerated packages like TensorFlow, PyTorch, and RAPIDS.
Now generally available, you can configure the resources and runtime parameters for application sources in the NextGen Registry. リソースバンドルは、本番環境での潜在的な環境エラーを最小限に抑えるために、アプリケーションが消費できるメモリーとCPUの最大量を決定します。 アプリケーションのソースから構築されたmetadata.yamlファイルに含めることで、カスタムアプリケーションで使用されるランタイムパラメーターを作成および定義できます。
Build custom applications from the template gallery¶
DataRobot provides templates from which you can build custom applications. These templates allow you to leverage pre-built application front-ends, out of the box, and offer extensive customization options. You can leverage a model that has already been deployed to quickly start and access a Streamlit, Flask, or Slack application. Use a custom application template as a simple method for building and running custom code within DataRobot.
Now generally available, you can leveraging generative AI to create a chat generation Q&A application. Explore Q&A use cases, make business decisions, and showcase business value. The Q&A app offers an intuitive and responsive way to prototype, explore, and share the results of LLM models you've built, including with non-DataRobot users, to expand its usability.
You can also use a code-first workflow to manage the chat generation Q&A application. To access the flow, navigate to DataRobot's GitHub repo. The repo contains a modifiable template for application components.
Incremental learning support for dynamic datasets is now available¶
Support for modeling on dynamic datasets larger than 10GB, for example, data in a Snowflake, BigQuery, or Databricks data source, is now available. When configuring the experiment, set an ordering feature to create a deterministic sample from the dataset and then begin incremental modeling as usual. After model building starts, View experiment info now reports the selected ordering feature.
The custom jobs template gallery is now available for the generic, notification, and retraining job types—in addition to custom metric jobs. To access the new template gallery, from the Registry > Jobs tab, create a job from a template for any job type.
For a deployed binary classification, regression, or multiclass model built with location data in the training dataset, you can now leverage DataRobot Location AI to perform geospatial monitoring on the deployment's Data drift and Accuracy tabs. To enable geospatial analysis for a deployment, enable segmented analysis and define a segment for the location feature geometry, generated during location data ingest. The geometry segment contains the identifier used to segment the world into a grid of H3 cells.
For deployed text generation models, the Monitoring > Data exploration tab includes additional sort and filter options on the Tracing table, providing new ways to interact with a Generative AI deployment's stored prompt and response data and gain insight into a model's performance through the configured custom metrics. In addition, this release introduces custom metric templates for Cosine Similarity and Euclidean Distance.
Editable resource settings and runtime parameters for deployments¶
For deployed custom models, the custom model CPU (or GPU) resource bundle and runtime parameters defined during custom model assembly are now editable after assembly.
If the custom model is deployed on a DataRobot Serverless prediction environment and the deployment is inactive, you can modify the Resource bundle settings from the Resources tab.
Feature flags OFF by default: Enable Resource Bundles, Enable Custom Model GPU Inference (Premium feature), Enable Editing Custom Model Runtime-Parameters on Deployments
Use a deployment's Predictions > Make predictions tab to make batch predictions on a recipe wrangled from the Data Registry. バッチ予測とは、大規模なデータセットで予測を行う方法で、入力データを渡すと各行の予測結果が得られます。 In the Prediction dataset box, click Choose file > Wrangler recipe, then pick a recipe from the Data Registry:
ワークベンチでの予測
Batch predictions on recipes wrangled from the Data Registry are also available in Workbench. デプロイ前のモデルで予測を行うには、エクスペリメントのモデルリストからモデルを選択し、モデルアクション > 予測を作成をクリックします。
Use the declarative API to provision DataRobot assets¶
You can use the DataRobot declarative API as a code-first method for provisioning resources end-to-end in a way that is both repeatable and scalable. Supporting both Terraform and Pulumi, you can use the declarative API to programmatically provision DataRobot entities such as models, deployments, applications, and more. The declarative API allows you to:
Specify the desired end state of infrastructure, simplifying management and enhancing adaptability across cloud providers.
Automate the provisioning of DataRobot assets to ensure consistency across environments and alleviate concerns about execution order. Terraform and Pulumi allow you to provision in two phases: planning and application. You can view a plan that outlines what resources are created before committing to provisioning actions, and then resolve any infrastructure dependencies on your behalf when a change is made. Then, you can execute the provisioning separately. This makes provisioning easier to manage within a complex infrastructure. You can preview the impacts that changes will have to DataRobot assets downstream in the workflow.
Simplify version control.
Use application templates to reduce workflow duplication and ensure consistency.
Integrate with DevOps and CI/CD to ensure predictable, consistent infrastructure and reduce deployment risks.
Review an example below of how you can use the declarative API to provision DataRobot resources using the Pulumi CLI: