Modeling (V9.0)¶
The following table lists each new feature. See also the [deprecation and end-of-life announcements])#deprecation-announcements).
GA¶
Visual AI Image Embeddings visualization adds new filtering capabilities¶
The Understand > Image Embeddings tab helps to visualize what predicted results for your AI Project. Now, DataRobot calculates predicted values for the images and allows you to filter by those predictions. In addition, for select project types you can modify the prediction threshold (which may change the predicted label) and filter based on the new results. The image below shows all filtering options—new and existing—for all supported project types.
In addition, usability enhancements for clusters make exploring Visual AI results easier. With clustering, images display colored borders to indicate the predicted cluster.
Bias Mitigation functionality¶
Bias mitigation is now available as a generally available feature for binary classification projects. To clarify relationships between the parent model and any child models with mitigation applied, this release adds a table—Models with Mitigation Applied— accessible from the parent model on the Leaderboard.
Bias Mitigation works by augmenting blueprints with a pre- or post-processing task, causing the blueprint to then attempt to reduce bias across classes in a protected feature. You can apply mitigation either automatically (as part of Autopilot) or manually (after Autopilot completes). When run automatically, you set mitigation criteria as a part of the Bias and Fairness advanced option settings. Autopilot then applies mitigation to the top three Leaderboard models. Or, once Autopilot completes, you can apply mitigation to any non-blender, unmitigated model available from the Leaderboard. Finally, compare mitigated versus unmitigated models from the Bias vs Accuracy insight.
See the Bias Mitigation for more information.
Text Prediction Explanations¶
Text Prediction Explanations illustrate how individual words (n-grams) in a text feature influence predictions, helping to validate and understand the model and the importance it is placing on words. Previously, DataRobot evaluated the impact of text in a dataset as the impact of a text feature as a whole, potentially requiring reading the full text for best understanding. With Text Prediction Explanations, which uses the standard color bar spectrum of blue (negative) to red (positive) impact, you can easily visualize and understand your text. An option to display unknown n-grams helps to identify, via gray highlight, those n-grams not recognized by the model (most likely because they were not seen during training). Text Prediction Explanations, either XEMP OR SHAP, are run by default when text is present in a dataset.
For more information, see the Text Prediction Explanations documentation.
NLP Autopilot with better language support¶
A host of natural language processing (NLP) improvements are now generally available. The most impactful is the application of FastText for language detection at data ingest, which:
-
Allows DataRobot to generate the appropriate blueprints with parameters optimized for that language.
-
Adapt tokenization to the detected language for better word clouds and interpretability.
-
Trigger specific blueprint training heuristics so that accuracy-optimized Advanced Tuning settings are applied.
This feature works with multilingual use cases as well; Autopilot will detect multiple languages and adjust various blueprint settings for the greatest accuracy.
The following NLP enhancements are also now generally available:
-
New pre-trained BPE tokenizer (which can handle any language).
-
Refined Keras blueprints for NLP for improved accuracy and training time.
-
Various improvements across other NLP blueprints.
-
New Keras blueprints (with the BPE tokenizer) in the Repository.
Quick Autopilot mode improvements speed experimentation¶
With this month’s release, Quick Autopilot mode now uses a one-stage modeling process to build models and populate the Leaderboard in AutoML projects. In the new version of Quick, all models are trained at a max sample size—typically 64%. The specific number of Quick models run varies by project and target type. DataRobot selects which models to run based on a variety of criteria, including target and performance metric, but as its name suggests, chooses only models with relatively short training runtimes to support quicker experimentation. Note that to maximize runtime efficiency, DataRobot no longer automatically generates and fits the DR Reduced Features list. (Fitting the reduced list requires retraining models.)
Changes to blender model defaults¶
This release brings changes to the default behavior of blender models. A blender (or ensemble) model combines the predictions of two or more models, potentially improving accuracy. DataRobot can automatically create these models at the end of Autopilot when the Create blenders from top models advanced option is enabled. Previously the default setting was to enable creating blenders automatically; now, the default is not to build these models.
Additionally, the number of models allowed when creating blenders either automatically or manually has changed. While previously there was both no limit, and later a three-model maximum in the number of contributory models, that limit has been adjusted to allow up to eight models per blender.
Finally, the automatic creation of advanced blenders has been removed. These blenders used a backwards stage-wise process to eliminate models when it benefits the blend's cross-validation score.
- Advanced Average (AVG) Blend
- Advanced Generalized Linear Model (GLM) Blend
- Advanced Elastic Net (ENET) Blend
The following blender types are currently in the process of deprecation:
Blender | Deprecation status |
---|---|
Random Forest Blend (RF) | Existing RF blenders continue to work; you cannot create new RF blenders. |
Light Gradient Boosting Machine Blend (LGBM) | Existing LGBM blenders continue to work; you cannot create new LGBM blenders. |
TensorFlow Blend (TF) | Existing TF blenders do not work; you cannot create new TF blenders. |
These changes have been made in response to customer feedback. Because blenders can extend build times and cause deployment issues, the changes ensure that these impacts only affect those users needing the capability. Testing has determined that, in most cases, the accuracy gain does not justify the extended runtimes imposed on Autopilot. For data scientists who need blender capabilities, manual blending is not affected.
Japanese compliance documentation now generally available, more complete¶
With this release, model compliance documentation is now generally available for users in Japanese. Now, Japanese-language users can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management and download it as an editable Microsoft Word document. In the preview version, some sections were untranslated and therefore removed from the report. Now the following previously untranslated sections are translated and available for binary classification and multiclass projects:
- Bias and Fairness
- Lift Chart
- Accuracy
Anomaly detection compliance information is not yet translated and is not included. It is available in English if the information is required. Compliance Reports are a premium feature; contact your DataRobot representative for information on availability.
ROC Curve enhancements aid model interpretation¶
With this release, the ROC Curve tab introduces several improvements to help increase understanding of model performance at any point on the probability scale. Using the visualization now, you will notice:
- Row and column totals are shown in the Confusion Matrix.
- The Metrics section now displays up to six accuracy metrics.
- You can use Display Threshold > View Prediction Threshold to reset the visualization components (graphs and charts) to the model's default prediction threshold.
Create AI Apps from models on the Leaderboard¶
You can now create No-Code AI Apps directly from trained models on the Leaderboard. To do so, select the model, click the new Build app tab, and select the template that best suits your use case.
Then, name the application, select an access type, and click Create.
The new app appears in the Build app tab of the Leaderboard model as well as the Applications tab.
For more information, see the documentation for No-Code AI Apps.
Add custom logos to No-Code AI Apps¶
Now generally available, you can add a custom logo to your No-Code AI Apps, allowing you to keep the branding of the AI App consistent with that of your company before sharing it either externally or internally.
To upload a new logo, open the application you want to edit and click Build. Under Settings > Configuration Settings, click Browse and select a new image, or drag-and-drop an image into the New logo field.
For more information, see the No-Code AI App documentation.
No-Code AI App header enhancements¶
This release introduces improvements to the layout and header of No-Code AI Apps. Toggle between the tabs below to view the improvements made to the UI when using and editing an application:
Element | Description | |
---|---|---|
1 | Pages panel | Allows you to rename, reorder, add, hide, and delete application pages. |
2 | Widget panel | Allows you to add widgets to your application. |
3 | Settings | Modifies general configurations and permissions as well as displays app usage. |
4 | Documentation | Opens the DataRobot documentation for No-Code AI Apps. |
5 | Editing page dropdown | Controls the application page you are currently editing. To view a different page, click the dropdown and select the page you want to edit. Click Manage pages to open the Pages panel. |
6 | Preview | Previews the application on different devices. |
7 | Go to app / Publish | Opens the end-user application, where you can make new predictions, as well as view prediction results and widget visualizations. After editing an application, this button displays Publish, which you must click to apply your changes. |
8 | Widget actions | Moves, hides, edits, and deletes widgets. |
Widget | Description | |
---|---|---|
1 | Application name | Displays the application name. Click to return to the app's Home page. |
2 | Pages | Navigates between application pages. |
3 | Build | Allows you to edit the application. |
4 | Share | Share the application with users, groups, or organizations within DataRobot. |
5 | Add new row | Opens the Create Prediction page, where you can make single record predictions. |
6 | Add Data | Upload batch predictions—from the AI Catalog or a local file. |
7 | All rows | Displays a history of predictions. Select a row to view prediction results for that entry. |
Multiclass support in No-Code AI Apps¶
No-Code AI Apps now support multiclass classification deployments across all three template types—Predictor, Optimizer, and What-If. This gives users the ability to create applications that solve a broader range of business problems.
UI/UX improvements to No-Code AI Apps¶
This release introduces the following improvements to No-Code AI Apps:
-
An in-app tour has been added to help you set up Optimizer applications. Click the ? in the upper-right and select Show Optimizer Guide.
-
When opening an application, it now opens in Consume mode instead of Build mode.
-
In Consume > Optimization Details, the What-if and Optimizer widgets have been moved towards the top of the page.
-
In Optimizer applications, you previously needed to select a prediction row to calculate an optimization. Now, you can click the Optimize Row button in the All Rows widget to calculate and display the optimized prediction without leaving the page.
-
In Build mode, widgets no longer display an example.
Create Time Series What-if AI Apps¶
Now generally available, you can create What-if Scenario AI Apps from time series projects. This allows you to launch and easily configure applications in an enhanced visual and interactive interface, as well as share your What-if Scenario app with consumers who will be able to effortlessly build upon what’s already been generated by the builder and/or create their own scenarios on the same prediction files.
Additionally, you can edit the known in advance features for multiple scenarios at once using the Manage Scenarios feature.
For more information, see the Time series applications documentation.
Use Cases tab renamed to Value Tracker¶
With this release, the Use Cases tab at the top of the DataRobot is now the Value Tracker. While the functionality remains the same, all instances of “use cases” in this feature have been replaced by “value tracker.”
See the Value Tracker documentation for more information.
Reorder scheduled modeling jobs¶
You can now change the order of scheduled modeling and prediction jobs in your project’s Worker Queue—allowing you to run more important jobs sooner.
For more information, see the Worker Queue documentation.
Preview¶
Prediction Explanations for multiclass projects¶
DataRobot now calculates explanations for each class in an XEMP-based multiclass classification project, both from the Leaderboard and from deployments. With multiclass, you can set the number of classes to compute for as well as select a mode from predicted or actual (if using training data) results or specify to see only a specific set of classes:
This capability helps especially with projects that require “humans-in-the-loop” to review multiple options. Previously comparisons required building several binary classification models and use scripting to evaluate. When building a multiclass project, Prediction Explanations can help improve models by highlighting, for example, where a model is too accurate (potential leakage?), where residuals are too large (some data could be missing?), or where a model can’t clearly distinguish two classes (some data could be missing?).
See the section on XEMP Prediction Explanations for Multiclass for more information.
Text AI parameters now available via Composable ML¶
The ability to modify certain Text AI preprocessing tasks (Lemmatizer, PosTagging, and Stemming) is moving from the Advanced Tuning tab to blueprint tasks accessible via composable ML. The new Text AI preprocessing tasks unlock additional pathways to create unique text blueprints. For example, you can now use lemmatization in any text model that supports that preprocessing task instead of being limited to TF-IDF blueprints.
Required feature flag: Enable Text AI Composable Vertices
Prediction Explanations for cluster models¶
Now available for preview, you can use Prediction Explanations with clustering to uncover which factors most contributed to any given row’s cluster assignment. With this insight, you can easily explain clustering model outcomes to stakeholders and identify high-impact factors to help focus their business strategies.
Functioning very much like multiclass Prediction Explanations—but reporting on clusters instead of classes—cluster explanations are available from both the Leaderboard and deployments when enabled. They are available for all XEMP-based clustering projects and are not available with time series.
Required feature flag: Enable Clustering Prediction Explanations
Preview documentation.
Composable ML task categories refined¶
In response to the feedback and widespread adoption of Composable ML and blueprint editing, this release brings some refinements to task categorization. For example, boosting tasks are now available under the specific project/model type:
Blueprint toggle allows summary and detailed views from Leaderboard¶
Blueprints that are viewed from the Leaderboard’s Blueprint tab are, by default, a read-only, summarized view, showing only those tasks used in the final model.
However, the original modeling algorithm often contains many more “branches,” which DataRobot prunes when they are not applicable to the project data and feature list. Now, you can toggle to see a detailed view while in read-only mode. Prior to the introduction of this feature, viewing the full blueprint required entering edit mode of the blueprint editor.
Required feature flag: Enable Blueprint Detailed View Toggle
Preview documentation.
Sliced insights show a subpopulation of model data¶
Now available as preview, slices allow you to define filters for categorical, numeric, or both types of features. Viewing and comparing insights based on segments of a project’s data helps to understand how models perform on different subpopulations. You can also compare a slice against the “global” slice--all training data (depending on the insight). Configuring a slice allows you to choose a feature and set operators and values to narrow the data returned.
Sliced insights are available for Lift Chart, ROC Curve, Residual, and Feature Impact visualizations.
Required feature flag: Enable Sliced Insights
Preview documentation.
Details page added to time series Predictor applications¶
In time series Predictor No-Code AI Apps, you can now view prediction information for specific predictions or dates, allowing you to not only see the prediction values, but also compare them to other predictions that were made for the same date. Previously, you could only view values for the prediction, residuals, and actuals, as well as the top three Prediction Explanations.
To drill down into the prediction details, click on a prediction in either the Predictions vs Actuals or Prediction Explanations chart. This opens the Forecast details page, which displays the following information:
Description | |
---|---|
1 | The average prediction value in the forecast window. |
2 | Up to 10 Prediction Explanations for each prediction. |
3 | Segmented analysis for each forecast distance within the forecast window. |
4 | Prediction Explanations for each forecast distance included in the segmented analysis. |
Required feature flag: Enable Application Builder Time Series Predictor Details Page
Preview documentation.
Deprecation announcements¶
Auto-Tuned Word N-gram Text Modeler blueprints removed from the Leaderboard¶
Auto-Tuned Word N-gram Text Modeler blueprints are no longer run as part of Autopilot for binary classification, regression, and multiclass/multimodal projects. The modeler blueprints remain available in the repository. Currently, Light GBM (LGBM) models run these auto-tuned text modelers for each text column, and for each, a new blueprint is added to the Leaderboard. However, these Auto-Tuned Word N-gram Text Modelers are not correlated to the original LGBM model (i.e., modifying them does not affect the original LGBM model). Now, Autopilot creates a single, larger blueprint for all Auto-Tuned Word N-gram Text Modeler tasks instead of one for each text column. Note that this change has no backward-compatibility issues; it applies to new projects only.
DataRobot Prime model creation removed¶
The ability to create new DataRobot Prime models has been removed from the application. This does not affect existing Prime models or deployments. It is being replaced with the new ability to export Python or Java code from Rulefit models using the Scoring Code capabilities. RuleFit models, which differ from Prime only in that they use raw data for their prediction target rather than predictions from a parent model, support Java/Python source code export. There is no change in the availability of Java Scoring Code for other blueprint types, and any existing Prime models will continue to function.
User/Open source models disabled in November¶
As of November 2022, DataRobot disabled all models containing User/Open source (“user”) tasks. See the release announcement for full information on identifying these models. Use the Composable ML functionality to create custom models.