Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Segmented modeling for multiseries

Complex and accurate demand forecasting typically requires deep statistical know-how and lengthy development projects around big data architectures. DataRobot's multiseries with segmented modeling automates this requirement by creating multiple projects—"under the hood." Once the segments are identified and built, they are merged to make a single-object—the Combined Model. This leads to improved model performance and decreased time to deployment.

When using segmented modeling, DataRobot creates a full project for each segment—running Autopilot (full or quick) then selecting (and preparing) a recommended model for deployment. DataRobot also marks the recommended model as "segment champion," although you can reassign the champion at any time.

Note

Although DataRobot creates a project for each segment, these projects are not available from the Project Control Center. Instead, they are investigated and managed from within the Combined Model, which is available in the Project Control Center.

DataRobot incorporates each segment champion to create the Combined Model—a main umbrella project that acts as a collection point for all the segments. The result is a “one-model" deployment—the segmented project—while each segment has its own model running in the deployment behind the scenes.

It is important to remember that while segmented modeling solves some problems (model factories, multiple deployments), it cannot know which segments you care most about or which have the highest ROI. To be succesful, you must correctly define the use case, set up the dataset, and define the segments.

See the segmented modeling FAQ for more detailed information. See the visual overview for a quick representation of why to use segmented modeling.

Segmented modeling workflow

Modeling with segmentation is available for multiseries regression projects. The following describes the segmented modeling workflow.

  1. Follow the standard time series workflow to set the target and turn on time-aware modeling. Choose Automated time series forecasting with backtesting as the modeling method.

  2. Enable multiseries modeling by setting the series identifier.

  3. Change the segmentation method by clicking the pencil:

  4. Set the segmentation configuration, as described in the table:

    Selection Description
    Yes, build models per segment Select this option to enable segmented modeling. When selected, you must also set how segments are defined (3).
    No, build models without segmenting Select this option to return to the previous Time Series Forecasting window. If you choose not to do segmented modeling, DataRobot builds one model for all detected series (regular multiseries).
    How are the segments defined? Select a column from the training dataset that DataRobot will use as the segment ID. Start to type a column name and see matching auto-complete selections or select from the identifiers that DataRobot identified. Segment ID must be different than series ID.
    What if I want to have one series per segment?

    The columns specified for segment ID and series ID cannot be the same; however, you can duplicate the series ID column and give it a new name. Then, set the segment ID to the new column name (using the How are the segments defined section). DataRobot will generate the segments using the series ID.

  5. Once the ID is set, click Set segmentation method. The Time Series Forecasting window returns, where you can then continue the configuration—training windows, duration, KA, and calendar selection—including changing the selected series and segment.

  6. When the configuration is ready, select either Quick or full Autopilot and click Start. DataRobot prompts to remind you that because it builds a complete Autopilot project for each segment, the time required to finish modeling could be quite long. Confirm you want to proceed by clicking Start modeling. (You can set DataRobot to proceed without approval for future segmented projects.)

  7. After EDA2 completes, DataRobot immediately creates the Combined Model. Because the "child" models (the independent segment models) are still building, the Combined Model is not complete. However, you can control building and worker assignment from the Combined Model.

  8. When modeling completes, use the Combined Model to explore segments.

Explore results

Once Autopilot has finished, the Model tab indicates that one model has built. This is the completed Combined Model.

The charts and graphs available for segmented modeling are dependent on the model type:

  • For the Combined Model, you can access the Segmentation, a model blueprint, the modeling log, Make Predictions, and Comments.

  • For the models available in the individual segments, the visualizations and modeling tabs (Repository, Compare Models, etc.) appropriate to a multiseries regression project are available.

Segmentation tab

Click to expand the Combined Model and expose the Segmentation tab.

The following table describes components of the Segmentation tab:

Component Description
Search Use search to change the display so that it only includes segments that match the entered string.
Download CSV Download a spreadsheet containing the metadata associated with the Combined Model project, including metric scores, champion history, IDs, and project history.
Segment Lists the segment values, found in the training data by DataRobot, in the specified segment ID.
Rows Displays segment statistics from the training data—the raw number of rows and the percentage of the dataset that those rows represent.
Total models Indicates the number of models DataRobot built for that segment during the Autopilot process.
Champion last updated Indicates the time and the responsible party for the last segment champion assignment. The entry also provides an icon indicating the champion model type. Initially, all rows will list by DataRobot. Segments are listed by the "All backtests" scores; click the column header to re-sort.
Backtest 1 Indicates the champion model's Backtest 1 score for the selected metric.
All backtests Indicates the average score for all backtests run for the champion model.
Holdout Provides an icon that indicates whether Holdout has been unlocked.

Explore segments

The Combined Model is comprised of one model per segment—the segment champion. The individual segments, on the other hand, comprise a complete Autopilot project. You can investigate the project from the segment's Leaderboard and even deploy a segment model, independent of the Combined Model.

Access a segment Leaderboard

There are multiple ways to access a segment's Leaderboard.

From the Combined Model

Expand the Combined Model and click the segment name in the Segmentation tab list.

Once clicked, the segment's Leaderboard opens. Notice that:

Indicator Description
A full set of models has been built.
DataRobot has recommended a model for deployment and marked a model as champion.
Regular Worker Queue controls are available.

From the Segment dropdown

Use the Segment dropdown to change your view.

  • From a segment:
    • Select an alternate segment. The segment's Leaderboard displays.
    • Select View all segments to return to the Combined Model.
  • From the Combined Model, select a segment to open the segment's Leaderboard.

Reassign the champion model

While DataRobot initially assigns a segment champion, you may want to change the designation. This could be the case, for example, if it were important to you that all segments provide the same model type to the Combined Model. Identify the segment champion from a segment's Leaderboard, where it is marked with the champion badge:

To reassign the champion, from the segment Leaderboard, select the model you want as champion. Then, from the menu select Leaderboard options > Mark model as champion.

The badge moves to the new model:

And the Combined Model's Segmentation tab shows when the champion was last updated and who assigned the new champion.

Control across projects

Because DataRobot treats each segment as an individual Autopilot project, completing the Combined Model can take significantly longer than a regular multiseries project. The exact time is dependent on the number of segments and size of your dataset. You can use the controls described below to set workers (1) and to stop and start modeling (2). All actions are performed from the Worker Queue of the Combined Model and apply to all segment projects. You can also use it to unlock Holdout (3).

Worker control

From the Combined Model, you can control the number of modeling workers across all segment projects. DataRobot automatically re-balances workers between segments, distributing available workers between running segments as each segment completes modeling. When changing the worker count, DataRobot ignores any projects not in the modeling stage.

Pause/Start/Stop child modeling

From the Worker Queue of the parent segmented project, you can control modeling actions of the child projects. Use the stop/start/cancel buttons in the sidebar, and the selected action is applied to all child projects simultaneously. Specifically:

  • At the start of a segmented project, no queue actions are available.

  • When all segments have reached the EDA2 stage, the Pause and Start buttons become available.

  • The Cancel button becomes available when child projects are in the modeling stage and have at least one job running.

Unlock Holdout

You can unlock Holdout for an entire project or for each segment.

  • To unlock the entire project—all models in all segments—choose Unlock Holdout from the Combined Model's Worker Queue.

  • To unlock Holdout for all models in a segment, (open the segment's Leaderboard) and choose Unlock project Holdout for all models in the Worker Queue.

Leaderboard model scores

When looking at the Combined Model on the Leaderboard, you may notice that no scores display:

This is because the default metric for a regression project is RMSE. If you change the metric to a supported metric, for example MAE, an aggregated score from the champion models becomes available. (Supported metrics are MAD, MAE, MASE, MAPE, and SMAPE.)

Scores for the Combined Model, which are based on the champion scores or, if champion(s) are prepared for deployment models, their parents’ scores. To see the individual champion scores, expand the Combined Model to display the Segmentation tab.

Why no champion score?

When DataRobot selects a champion model, that model has been prepared for deployment. As part of the preparation, the model is retrained into Holdout (retrained as a start/end model into the most recent data). The parent of the champion/recommended model is the model the champion is trained from. So looking at the parent provides scores for the champion.

Scores are reported there. If you change the champion, DataRobot passes the scores from the new champion (or its parent) to the Combined Model. Note that:

  • An asterisk next to a score indicates that the score reflects that of the parent model.

  • An N/A in the score column indicates that backtests have not been run. Open the model on the segment's Leaderboard and run "All backtests."


Updated February 21, 2022
Back to top