Time series (V9.0)¶
The following table lists each new feature.
Quick Autopilot improvements now available for time series¶
With this release, Quick Autopilot has been streamlined for time series projects, speeding experimentation. In the new version of Quick, to maximize runtime efficiency, DataRobot no longer automatically generates and fits the DR Reduced Features list, as fitting requires retraining models. Models are still trained at the maximum sample size for each backtest, defined by the project’s date/time partitioning. The specific number of models run varying by project and target type. See the documentation on the model recommendation process for alternate methods to build a reduced feature list.
New Leaderboard and Repository filtering options¶
With this release, you can now limit the Leaderboard or Repository to display models/blueprints matching the selected filters. Leaderboard filters allow you to set options categorized as: sample size—or for time series projects, training period—model family, model characteristics, feature list, and more. Repository filtering includes blueprint characteristics, families, and types. The new, enhanced filtering options are centralized in a single modal (one for the Leaderboard and one for the Repository), where previously, the more limited methods for filtering were in separate locations.
See the Leaderboard reference for more information.
Time series clustering¶
Clustering, an unsupervised learning technique, can be used to identify natural segments in your data. DataRobot now allows you to use clustering to discover the segments to be used for segmented modeling. This technique enables you to easily group similar series across a multiseries dataset from within the DataRobot platform. Use the discovered clusters to get a better understanding of your data or use them as input to time series segmented modeling.
This workflow builds a clustering model and uses the model to help define the segments for a segmented modeling project.
A new Use for Segmentation tab lets you enable the clusters to be used in the segmented modeling project.
The clustering model is saved as a model package in the Model Registry, so that you can use it for subsequent segmented modeling projects.
Alternatively, you can save the clustering model to the Model Registry explicitly, without creating a segmented modeling project immediately. In this case, you can later create a segmented modeling project using the saved clustering model package.
The general availability of clustering brings some improvements:
A new Series Insights tab specifically for clustering provides information on series/cluster relationships and details.
Clarified project setup that removes extraneous feature lists and window setup.
Clustering models, and their resulting segmented models, use a uniform quantity of data for predictions (with the size based on the training size for the original clustering model).
A cluster buffer prevents data leakage and ensures that you are not training a clustering model into what will be the holdout partition in segmentation.
A toggle to control the 10% clustering buffer if you aren’t using the result for segmented modeling.
For more information, see the Time series clustering documentation.
Time series 5GB support¶
With this deployment, time series projects on the DataRobot managed AI Platform can support datasets up to 5GB. Previously the limit for time series projects on the cloud was 1GB. For more project- and platform-based information, see the dataset requirements reference.
Accuracy Over Time enhancements¶
Because multiseries modeling supports up to 1 million series and 1000 forecast distances, previously, DataRobot limited the number of series in which the accuracy calculations were performed as part of Autopilot. Now, the visualizations that use these calculations can automatically run a number of series (up to a certain threshold) and then run additional series, either individually or in bulk.
The visualizations that can leverage this functionality are:
- Accuracy Over Time
- Anomaly Over Time
- Forecast vs. Actual
- Model Comparison
For more information, see the Accuracy Over Time for multiseries documentation.
Native Prophet and Series Performance blueprints in Autopilot¶
Support for native Prophet, ETS, and TBATS models for single and multiseries time series projects was announced as generally available in the June release. (A detailed model description can be found for each model by accessing the model blueprint.) With this release, a slight modification has been made so that these models no longer run as part of Quick Autopilot. DataRobot will run them, as appropriate in full Autopilot and they are also available from the model repository.
Project duplication, with settings, for time series projects¶
Now generally available, you can duplicate ("clone") any DataRobot project type, including unsupervised and time-aware projects like time series, OTV, and segmented modeling. Previously, this capability was only available for AutoML projects (non time-aware regression and classification).
Duplicating a project provides an option to select the dataset only—which is faster than re-uploading it—or a dataset and project settings. For time-aware projects, this means cloning the target, the feature derivation and forecast window values, any selected calendars, KA, features, series IDs—all time series settings. If you used the data prep tool to address irregular time step issues, cloning uses the modified dataset (which is the one that was used for model building in the parent project.) You can access the Duplicate option from either the projects dropdown (upper right corner) or the Manage Project page.
Scoring Code for time series projects¶
Now generally available, you can export time series models in a Java-based Scoring Code package. Scoring Code is a portable, low-latency method of utilizing DataRobot models outside the DataRobot application.
You can download a model's time series Scoring Code from the following locations:
Download from the Leaderboard (Leaderboard > Predict > Portable Predictions)
Download from the deployment (Deployments > Predictions > Portable Predictions)
With segmented modeling, you can build individual models for segments of a multiseries project. DataRobot then merges these models into a Combined Model. You can generate Scoring Code for the resulting Combined Model.
To generate and download Scoring Code, each segment champion of the Combined Model must have Scoring Code:
After you ensure each segment champion of the Combined Model has Scoring Code, you can download the Scoring Code from the Leaderboard or you can deploy the Combined Model and download the Scoring Code from the deployment.
You can now include prediction intervals in the downloaded Scoring Code JAR for a time series model. You can download Scoring Code with prediction intervals from the Leaderboard or from a deployment.
You can score data at the command line using the downloaded time series Scoring Code. This release introduces efficient batch processing for time series Scoring Code to support scoring larger datasets. For more information, see the Time series parameters for CLI scoring documentation.
For more details on time series Scoring Code, see Scoring Code for time series projects.
Autoexpansion of time series input in Prediction API¶
When making predictions with time series models via the API using a forecast point, you can now skip the forecast window in your prediction data. DataRobot generates a forecast point automatically via autoexpansion. Autoexpansion applies automatically if predictions are made for a specific forecast point and not a forecast range. It also applies if a time series project has a regular time step and does not use Nowcasting.
Calculate Feature Impact for each backtest¶
Feature Impact provides a transparent overview of a model, especially in a model's compliance documentation. Time-dependent models trained on different backtests and holdout partitions can have different Feature Impact calculations for each backtest. Now generally available, you can calculate Feature Impact for each backtest using DataRobot's REST API, allowing you to inspect model stability over time by comparing Feature Impact scores from different backtests.
Support for Manual mode introduced to segmented modeling¶
With this release, you can now use manual mode with segmented modeling. Previously you could on choose Quick or full Autopilot. When using Manual mode with segmented modeling, DataRobot creates individual projects per segment and completes preparation as far as the modeling stage. However, DataRobot does not create per-project models. It does create the Combined Model (as a placeholder), but does not select a champion. Using Manual mode is a technique you can use to have full manual control over which models are trained in each segment and selected as champions, without taking the time to build models.
Deployment for time series segmented modeling¶
To fully leverage the value of segmented modeling, you can deploy Combined Models like any other time series model. After selecting the champion model for each included project, you can deploy the Combined Model to create a "one-model" deployment for multiple segments; however, the individual segments in the deployed Combined Model still have their own segment champion models running in the deployment behind the scenes. Creating a deployment allows you to use DataRobot MLOps for accuracy monitoring, prediction intervals, challenger models, and retraining.
Time series segmented modeling deployments do not support data drift monitoring. For more information, see the feature considerations.
After you complete the segmented modeling workflow and Autopilot has finished, the Model tab contains one model. This model is the completed Combined Model. To deploy, click the Combined Model, click Predict > Deploy, and then click Deploy model.
After deploying a Combined Model, you can change the segment champion for a segment by cloning the deployed Combined Model and modifying the cloned model. This process is automatic and occurs when you attempt to change a segment's champion within a deployed Combined Model. The cloned model you can modify becomes the Active Combined Model. This process ensures stability in the deployed model while allowing you to test changes within the same segmented project.
Only one Combined Model on a project's Leaderboard can be the Active Combined Model (marked with a badge)
Once a Combined Model is deployed, it is labeled Prediction API Enabled. To modify this model, click the active and deployed Combined Model, and then in the Segments tab, click the segment you want to modify.
Next, reassign the segment champion, and in the dialog box that appears, click Yes, create new combined model.
On the segment's Leaderboard, you can now access and modify the Active Combined Model.
For more information, see the Deploy a Combined Model documentation.
New metric support for segmented projects¶
Combined Models, the main umbrella project that acts as a collection point for all segments in a time series segmented modeling project, introduces support for RMSE-based metrics. In addition to earlier support for MAD, MAE, MAPE, MASE, and SMAPE, segmented projects now also support RMSE, RMSLE, and Theil’s U (weighted and unweighted).
Retraining Combined Models now faster¶
Now generally available, time series segmented models now support retraining on the same feature list and blueprint as the original model without the need to rerun Autopilot or feature reduction. Previously, rerunning Autopilot was the only way to retrain this model type. This new support creates parity in retraining between retraining a non-segmented time series model and a segmented model. Because the improvement ensures that retraining leverages the feature reduction computations from the original, only newly introduced features need to go through that process, saving time and adding flexibility. Note that retraining retrains the champion of a segment, it does not rerun the project and select a new champion.
Prediction Explanations for cluster models¶
Now available as Public Preview, you can use Prediction Explanations with clustering to uncover which factors most contributed to any given row’s cluster assignment. With this insight, you can easily explain clustering model outcomes to stakeholders and identify high-impact factors to help focus their business strategies.
Functioning very much like multiclass Prediction Explanations—but reporting on clusters instead of classes—cluster explanations are available from both the Leaderboard and deployments when enabled. They are available for all XEMP-based clustering projects and are not available with time series.
Required feature flag: Enable Clustering Prediction Explanations
Public preview documentation.
Period Accuracy allows focus on specific periods in training data¶
Available as public preview for OTV and time series projects, the Period Accuracy insight lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. Periods are defined in a separate CSV file that identifies rows to group based on the project’s data/time feature.
Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.
Required feature flag: Period Accuracy Insight
Public preview documentation.
Batch predictions for TTS and LSTM models¶
Traditional Time Series (TTS) and Long Short-Term Memory (LSTM) models— sequence models that use autoregressive (AR) and moving average (MA) methods—are common in time series forecasting. Both AR and MA models typically require a complete history of past forecasts to make predictions. In contrast, other time series models only require a single row after feature derivation to make predictions. Previously, batch predictions couldn't accept historical data beyond the effective feature derivation window (FDW) if the history exceeded the maximum size of each batch, while sequence models required complete historical data beyond the FDW. These requirements made sequence models incompatible with batch predictions. Enabling this public preview feature removes those limitations to allow batch predictions for TTS and LSTM models.
Time series Autopilot still doesn't include TTS or LSTM model blueprints; however, you can access the model blueprints in the model Repository.
To allow batch predictions with TTS and LSTM models, this feature:
Updates batch predictions to accept historical data up to the maximum batch size (equal to 50MB or approximately a million rows of historical data).
Updates TTS models to allow refitting on an incomplete history (if the complete history isn't provided).
If you don't provide sufficient forecast history at prediction time, you could encounter prediction inconsistencies. For more information on maintaining accuracy in TTS and LSTM models, see the prediction accuracy considerations.
Required feature flag: Enable TTS and LSTM Time Series Model Batch Predictions
Public preview documentation.
Time series model package prediction intervals¶
Now available for public preview, you can enable the computation of a model's time series prediction intervals (from 1 to 100) during model package generation. To run a DataRobot time series model in a remote prediction environment, you download a model package (.mlpkg file) from the model's deployment or the Leaderboard. In both locations, you can now choose to Compute prediction intervals during model package generation. You can then run prediction jobs with a portable prediction server (PPS) outside DataRobot.
Before you download a model package with prediction intervals from a deployment, ensure that your deployment supports model package downloads. The deployment must have a DataRobot build environment and an external prediction environment, which you can verify using the Governance Lens in the deployment inventory:
To download a model package with prediction intervals from a deployment, in the external deployment, you can use the Predictions > Portable Predictions tab:
To download a model package with prediction intervals from a model in the Leaderboard, you can use the Predict > Deploy or Predict > Portable Predictions tab.
Required feature flag: Enable computation of all Time-Series Intervals for .mlpkg
Public preview documentation.