Segmented modeling for multiseries¶
Complex and accurate demand forecasting typically requires deep statistical know-how and lengthy development projects around big data architectures. DataRobot's multiseries with segmented modeling automates this requirement by creating multiple projects—"under the hood." Once the segments are identified and built, they are merged to make a single-object—the Combined Model. This leads to improved model performance and decreased time to deployment.
When using segmented modeling, DataRobot creates a full project for each segment—running Autopilot (full or quick) then selecting (and preparing) a recommended model for deployment. (See the note on using segmented modeling in Manual mode.) DataRobot also marks the recommended model as "segment champion," although you can reassign the champion at any time.
Although DataRobot creates a project for each segment, these projects are not available from the Project Control Center. Instead, they are investigated and managed from within the Combined Model, which is available in the Project Control Center.
DataRobot incorporates each segment champion to create the Combined Model—a main umbrella project that acts as a collection point for all the segments. The result is a “one-model" deployment—the segmented project—while each segment has its own model running in the deployment behind the scenes.
It is important to remember that while segmented modeling solves some problems (model factories, multiple deployments), it cannot know which segments you care most about or which have the highest ROI. To be successful, you must correctly define the use case, set up the dataset, and define the segments.
See the segmented modeling FAQ for more detailed information. See the visual overview for a quick representation of why to use segmented modeling.
Modeling with segmentation is available for multiseries regression projects.
Segmented modeling workflow¶
Time series segmented modeling requires, first, defining segments that divide the dataset. To define segments, you can allow DataRobot to:
- Discover clusters in your data and then use those clusters as segments.
- Assign segments for you based on the configured segment ID.
To build a segmented modeling project:
Follow the standard time series workflow—set the target and turn on time-aware modeling. Choose Automated time series forecasting as the modeling method.
Enable multiseries modeling by setting the series identifier.
Set the segmentation method by clicking the pencil:
Set whether to enable segmented modeling:
- Select Yes, build models per segment to enable segmented modeling. When selected, you must also set how segments are defined.
- Select No, build models without segmenting to return to the previous Time Series Forecasting window. If you choose not to do segmented modeling, DataRobot builds one model for all detected series (regular multiseries).
Set how segments are defined. The table below describes each option:
Option Description ID column Select a column from the training dataset that DataRobot will use as the segment ID. Start to type a column name and see matching auto-complete selections or select from the identifiers that DataRobot identified. Segment ID must be different than series ID (see note below). Existing clustering model Use a clustering model previously saved to the Model Registry. New clustering model Start a new clustering project, with results later applied via the Existing clustering model option, by clicking time series clustering link in the help text.
What if I want to have one series per segment?
The columns specified for segment ID and series ID cannot be the same; however, you can duplicate the series ID column and give it a new name. Then, set the segment ID to the new column name (using the How are the segments defined section). DataRobot will generate the segments using the series ID.
Once the method is selected—either the ID is set or an existing clustering model is selected—click Set segmentation method. The Time Series Forecasting window returns, where you can then continue the configuration—training windows, duration, KA, and calendar selection—including changing the selected series and segment.
How are the training periods determined if clustering was used?
When building a segmented model using found clusters to split the dataset into the child projects (segments), DataRobot applies the training window settings from the clustering project to the segmented modeling project. This protects the holdout in segmented modeling and prevents data leakage from the clustering model when splitting the segmented dataset into child projects. Using the start and end dates of each series, the general scenarios that affect the methodology:
- If the series data contains the time window needed (as defined in the clustering project), DataRobot simply passes the series data along.
- Series data before the clustering training end: If there is a series that is shorter than the full training window and extends past holdout, DataRobot only uses data points before the clustering end that is the size of the training duration (only the portions that exist within the training boundary).
- Series data has only data older in time than the clustering training end: If there is a legacy series in which its data does not fall into the training window, DataRobot "slides back" and gathers data for the duration of the training window agains collect so it can be used in segmented and not lost
- Series data only exists "newer" in time than the clustering training end: If a series only exists in holdout, DataRobot slides the window forward but does not select any data that was used in training. In this way, the data is not dropped, but it is only used for examining the holdout of a child project.
When the configuration is ready, select Quick or full Autopilot, or Manual mode, and click Start. DataRobot prompts to remind you that because it builds a complete project for each segment, the time required to finish modeling could be quite long. Confirm you want to proceed by clicking Start modeling. (You can set DataRobot to proceed without approval for future segmented projects.)
After EDA2 completes, DataRobot immediately creates the Combined Model. Because the "child" models (the independent segment models) are still building, the Combined Model is not complete. However, you can control building and worker assignment from the Combined Model.
When modeling completes, use the Combined Model to explore segments.
Once modeling has finished, the Model tab indicates that one model has built. (See the note regarding outcome when using Manual mode.) This is the completed Combined Model.
The charts and graphs available for segmented modeling are dependent on the model type:
For the Combined Model, you can access the Segmentation, a model blueprint, the modeling log, Make Predictions, and Comments.
For the models available in the individual segments, the visualizations and modeling tabs (Repository, Compare Models, etc.) appropriate to a multiseries regression project are available.
Click to expand the Combined Model and expose the Segmentation tab.
The following table describes components of the Segmentation tab:
|Search||Use search to change the display so that it only includes segments that match the entered string.|
|Download CSV||Download a spreadsheet containing the metadata associated with the Combined Model project, including metric scores, champion history, IDs, and project history.|
|Segment||Lists the segment values, found in the training data by DataRobot, in the specified segment ID.|
|Rows||Displays segment statistics from the training data—the raw number of rows and the percentage of the dataset that those rows represent.|
|Total models||Indicates the number of models DataRobot built for that segment during the build process.|
|Champion last updated||Indicates the time and the responsible party for the last segment champion assignment. The entry also provides an icon indicating the champion model type. Initially, all rows will list by DataRobot. Segments are listed by the "All backtests" scores; click the column header to re-sort.|
|Backtest 1||Indicates the champion model's Backtest 1 score for the selected metric.|
|All backtests||Indicates the average score for all backtests run for the champion model.|
|Holdout||Provides an icon that indicates whether Holdout has been unlocked.|
The Combined Model is comprised of one model per segment—the segment champion. The individual segments, on the other hand, comprise a complete project. You can investigate the project from the segment's Leaderboard and even deploy a segment model, independent of the Combined Model.
Access a segment's Leaderboard¶
There are multiple ways to access a segment's Leaderboard.
From the Combined Model¶
Expand the Combined Model and click the segment name in the Segmentation tab list.
Once clicked, the segment's Leaderboard opens. Notice that:
|A full set of models has been built.|
|DataRobot has recommended a model for deployment and marked a model as champion.|
|Regular Worker Queue controls are available.|
From the Segment dropdown¶
Use the Segment dropdown to change your view.
- From a segment:
- Select an alternate segment. The segment's Leaderboard displays.
- Select View all segments to return to the Combined Model.
- From the Combined Model, select a segment to open the segment's Leaderboard.
Reassign the champion model¶
While DataRobot initially assigns a segment champion, you may want to change the designation. This could be the case, for example, if it were important to you that all segments provide the same model type to the Combined Model. Identify the segment champion from a segment's Leaderboard, where it is marked with the champion badge:
To reassign the champion, from the segment Leaderboard, select the model you want as champion. Then, from the menu select Leaderboard options > Mark model as champion.
The badge moves to the new model:
And the Combined Model's Segmentation tab shows when the champion was last updated and who assigned the new champion.
Control across projects¶
Because DataRobot treats each segment as an individual project, completing the Combined Model can take significantly longer than a regular multiseries project. The exact time is dependent on the number of segments and size of your dataset. You can use the controls described below to set workers (1) and to stop and start modeling (2). All actions are performed from the Worker Queue of the Combined Model and apply to all segment projects. You can also use it to unlock Holdout (3).
From the Combined Model, you can control the number of modeling workers across all segment projects. DataRobot automatically re-balances workers between segments, distributing available workers between running segments as each segment completes modeling. When changing the worker count, DataRobot ignores any projects not in the modeling stage.
Pause/Start/Stop child modeling¶
From the Worker Queue of the parent segmented project, you can control modeling actions of the child projects. Use the stop/start/cancel buttons in the sidebar, and the selected action is applied to all child projects simultaneously. Specifically:
At the start of a segmented project, no queue actions are available.
When all segments have reached the EDA2 stage, the Pause and Start buttons become available.
The Cancel button becomes available when child projects are in the modeling stage and have at least one job running.
You can unlock Holdout for an entire project or for each segment.
To unlock the entire project—all models in all segments—choose Unlock Holdout from the Combined Model's Worker Queue.
To unlock Holdout for all models in a segment, (open the segment's Leaderboard) and choose Unlock project Holdout for all models in the Worker Queue.
Leaderboard model scores¶
Scores for the Combined Model are updated when a model completes building (champions are assigned in all segments). Scores are recalculated any time a segment's champion is replaced. For efficiency, it is calculated by aggregating individual champion scores. Metrics that support this method of calculation are: MAD, MAE, MAPE, MASE, RMSE, RMSLE, SMAPE, and Theil’s U.
When one or more champions are prepared for deployment, the scores shown reflect the parent scores. (The parent of the champion/recommended model is the model the champion is trained from.)
How are metric scores weighted based on number of series?
The evaluation metric for a combined model is an average based on the number of rows in a particular partition (training, validation, holdout). Because each segment can have a different number of series to predict, DataRobot weights the value to account for the model count for each series. In the case of MAE/MAD/MAPE/SMAPE, it is calculated as:
MAE_X = MAE1 * w_1 + MAE2 * w2 + ...
For RMSE/RMSLE, as:
RMSE_X = sqrt(RMSE_1**2 * w1 + RMSE_2**2 * w2 + ...)
MASE/Theil’s U are calculated using champion metrics and champion base metrics in two steps—calculate naive model scores in all segments, then calculate final combined model score.
naive_X = base_X / score_X
SCORE = (base_1 + base_2 + ... ) / (naive_1 + naive_2 + ...)
The score weight for a segment is essentially the number of rows in a particular partition of the segment in relation to the total number of holdout rows in all segments. There are explicit tests for calculating score consistency. The score is calculated for the full dataset and then split into segments—scores are calculated individually for each segment and then combined and compared with the full dataset score.
When looking at the Combined Model on the Leaderboard, you may notice that the model has no score (only N/A), which indicates:
- The model is not yet complete.
- All backtests have not yet completed for one or more segment champions (the All Backtests score is N/A).
The selected metric does not support score aggregation (for example, FVE Poisson, Gamma Deviance, etc.). For example:
Change the metric to a supported metric, for example MAE, and an aggregated score from the Combined Model becomes available.
To see the individual champion scores, expand the Combined Model to display the Segmentation tab.
Individual champion scores are reported there. Note that:
- Both the Backtest 1 and All Backtests scores are N/A if the champion is not assigned.
- An asterisk indicates that the champion model has been prepared for deployment (retrained as a start/end model into the most recent data) and thus the scores of it's parent model are used.
If you change the champion, DataRobot passes the scores from the new champion (or its parent) to the Combined Model.
Manual mode in segmented modeling¶
When using Manual mode with segmented modeling, DataRobot creates individual projects per segment and completes preparation as far as the modeling stage. However, DataRobot does not create per-project models. It does create the Combined Model (as a placeholder), but does not select a champion. Using Manual mode is a technique you can use to have full manual control over which models are trained in each segment and selected as champions, without taking the time to build the models.
Deploy a Combined Model¶
To fully leverage the value of segmented modeling, you can deploy Combined Models like any other time series model. After selecting the champion model for each included project, you can deploy the Combined Model to create a "one-model" deployment for multiple segments; however, the individual segments in the deployed Combined Model still have their own segment champion models running in the deployment behind the scenes. Creating a deployment allows you to use DataRobot MLOps for accuracy monitoring, prediction intervals, challenger models, and retraining.
When segmented modeling completes, you can deploy the resulting Combined Model:
Once Autopilot has finished, the Model tab contains one model. This model is the completed Combined Model.
Click the Combined Model, and then click Predict > Deploy.
On the Deploy tab, click Deploy model.
You can also click Add to Model Registry and then deploy the Combined Model from there.
Monitor, manage, and govern the deployed model in DataRobot MLOps. Set up retraining policies to maintain model performance post-deployment.
Combined Model deployment considerations¶
Consider the following when working with segmented modeling deployments:
Time series segmented modeling deployments do not support data drift monitoring.
Automatic retraining for segmented deployments that use clustering models is disabled; retraining must be done manually.
Retraining can be triggered by accuracy drift in a Combined Model; however, it doesn't support monitoring accuracy in individual segments or retraining individual segments.
Combined model deployments can include standard model challengers.
Modify and clone a deployed Combined Model¶
After deploying a Combined Model, you can change the segment champion for a segment by cloning the deployed Combined Model and modifying the cloned model. This process is automatic and occurs when you attempt to change a segment's champion within a deployed Combined Model. The cloned model you can modify becomes the Active Combined Model. This process ensures stability in the deployed model while allowing you to test changes within the same segmented project.
Only one Combined Model on a project's Leaderboard can be the Active Combined Model (marked with a badge).
To modify and clone a deployed Combined Model, take the following steps:
Once a Combined Model is deployed, it is labeled Prediction API Enabled.
Click the active and deployed Combined Model, and then in the Segments tab, click the segment you want to modify.
In the dialog box that appears, click Yes, create new combined model.
On the project's Leaderboard, you can access and modify the Active Combined Model.
While the Combined Model updated notification is visible, you can click Go to Combined Model to return to the segment's Combined Model in the Leaderboard.