Time series (V7.0)¶
March 15, 2021
New time series features¶
See details of new time series features below:
- Smart and blended clustered blueprints increase accuracy
- Compare anomaly detection performance between models from the Model Comparison tab
- Monotonic modeling now available for time series projects
- Create external time series deployments
- Time series deployment support for the Make Predictions tab
- Detailed time series feature derivation documentation now available
- Beta: Head-to-head prediction comparison for DataRobot and external time-aware projects
- Beta: New methods for modeling on new series improve prediction accuracy
- Beta: New series humility rule expands abilities to train
Smart and blended clustered blueprints increase accuracy, improve build time¶
Now available from the Repository, new clustered blueprints create both a "smart" and blended version of assorted blueprints using different clusters with different feature lists. While DataRobot currently provides performance- and similarity-clustered blueprints, the new clustered blueprint selects clusters/feature list pairs from a selection of models, increasing accuracy and reducing build time. DataRobot now uses an extended feature list (any supported list, including custom lists, and especially effective with the Time Series Informative Feature list) that contains all the columns needed for three feature lists (“no-differencing,” “with differencing (latest),” and “with differencing (season)”). From these three groups, DataRobot constructs three column sets—basic feature columns, short periodicity features, and long periodicity features. It then divides modeling data according to those column sets and performs gridsearch to tune the model.
Now GA: Compare anomaly detection performance between models from the Model Comparison tab¶
The Model Comparison tab introduces additional improvements for time-aware projects. Now, you can select Anomaly Over Time to easily compare anomaly detection models to find where they agree and disagree about discovered anomalies. Drag the anomaly threshold up and down to control what is considered an anomaly, and then set backtest and forecast distances to zero in on the results of most interest.
Now GA: Monotonic modeling now available for time series projects, enabling desired outcomes and compliance¶
In some regulated industries, you want to force the directional relationship between a feature and the target (for example, higher home values should always lead to higher home insurance rates). DataRobot has supported this capability for AutoML projects, but with this release adds the capability to time series projects. By training with monotonic constraints based on business knowledge, you force certain XGBoost models to learn only monotonic (always increasing or always decreasing) relationships between specific features (raw or derived) and the target. For example, configure a monotonic relation between risk and credit card balance (e.g., with a large negative balance the approval risk is higher). This is not because the model learns that relationship itself, but because the rules are explicitly given as known and the model will use these rules when making predictions.
Now GA: Create external time series deployments (MLOps required)¶
Now generally available, you can create a time series model, deploy that model external to DataRobot, and report prediction statistics back to DataRobot using the MLOps agent. This allows you to develop a time series model in DataRobot, but also export it in an easily usable form while maintaining DataRobot's deployment monitoring and management functionality.
Now GA: Time series deployment support for the Make Predictions tab (MLOps required)¶
Now generally available, you can use the Make Predictions interface to efficiently score datasets with a deployed time series model. The interface allows you to see information about the model’s feature derivation window and forecast rows, ensuring that the data you are trying to score meets the proper requirements. You can also configure the time series options for the dataset you want to score so that you can make predictions for a specific forecast point without having to modify the dataset.
Detailed time series feature derivation documentation now available¶
The in-app UI platform documentation now includes a more complete view of the time series feature derivation process. The newly added documentation clearly articulates the feature derivation process—operators used and feature names created—that create the time series modeling dataset.
Beta: Time series data prep tool addresses gap handling to allow time-based mode with irregular time steps¶
For some time series, when the dataset is detected as irregular, DataRobot only allows row-based mode. Loss of time-based mode can result in significant gaps in some series. The time series data prep tool, available from the AI Catalog, provides a solution to this issue. It allows you to aggregate a dataset to a specified time step and impute the target for any missing rows. You can modify the dataset using either a selector-based method (Manual) or an editable Spark SQL query and then save the new dataset back to the AI Catalog as a Spark asset.
Beta: Head-to-head prediction comparison for DataRobot and external time-aware projects help drive business decisions¶
With this release, organizations with existing time-aware models outside of the DataRobot application can now create a prediction file from those models and use it for a baseline accuracy comparison. This feature introduces three additional "flavors" based on existing metrics (RMSE, MAE, and LogLoss), all of which are scaled to the external baseline (uploaded predictions), to provide an at-a-glance accuracy measure from the Leaderboard. To use the feature, simply upload the file into DataRobot prior to modeling, apply it through Advanced options > Time Series, and select the appropriate metric.
Using standard RMSE:
Using RMSE scaled to an external baseline:
Beta: New methods for modeling on new series improve prediction accuracy, trust¶
With this release, DataRobot introduces “cold start series” modeling—modeling on a series in which there is not sufficient historical data. For example, when performing demand forecasting for all SKUs in a store, you may want to predict sales for an item that has never been sold before (a “cold start” series). Previously, time series models produced an error when making predictions for a series that did not have the full history needed to derive features for a specific forecast point. Now, new blueprints are added to Autopilot that support features being derived using partial history or no history at all. These blueprints use a two-stage approach. In the first stage, the main effect model is built, which works well on averaged derived features. In the second stage, the known-in-advance features (if available) are used to account for series effects for zero history records and series effects are used for partial history. A special column is added to show how many data points (rows) were used in the feature derivation process.
Beta: New series humility rule expands abilities to train¶
DataRobot has supported multiseries blueprints that allow predicting on new series—series that were not trained previously and do not have enough points in the training dataset for accurate predictions. This is useful, for example, in demand forecasting—when a new product is introduced, you may want initial sales predictions. Now, in conjunction with “cold start modeling” (modeling on a series in which there is not sufficient historical data) you can predict on new series, but also keep accurate predictions for series with a history. This involves new Autopilot blueprints that support feature derivation using partial history or no history at all.
With the support in place, you can set up a humility rule that:
- triggers off a new series (unseen in training data).
- takes a specified action.
- optionally, returns a custom error message.
Time series issues fixed in v7.0.0¶
The following issues have been fixed since release 6.3.4.
TIME-4983: Fixes an issue where Eureqa models that were cancelled/errored before completion had a chance to leave unwanted files around that caused subsequent models run on the same parameters to fail.
TIME-6083: Fixes an issue causing the temporal hierarchical model to error on exponential time series projects with the project metric as RMSLE. The fix was made by removing this model from autopilot and the repository. This also removes the hierarchical and two stage time series models under the same conditions.
TIME-6776: DeepAR has been disabled for FW=[0, 0] projects.
TIME-7322: Enables Prediction Explanations for time series "Recommended for Deployment" models.
XAI-3266: Fixes generation of Shapley-based feature insights and Predictions Explanations for OTV models with time window sampling.
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.