Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Segmented modeling for multiseries

Complex and accurate demand forecasting typically requires deep statistical know-how and lengthy development projects around big data architectures. DataRobot's multiseries with segmented modeling automates this requirement by creating multiple projects—"under the hood." Once the segments are identified and built, they are merged to make a single-object—the Combined Model. This leads to improved model performance and decreased time to deployment.

When using segmented modeling, DataRobot creates a full project for each segment—running Autopilot (full or quick) then selecting (and preparing) a recommended model for deployment. (See the note on using segmented modeling in Manual mode.) DataRobot also marks the recommended model as "segment champion," although you can reassign the champion at any time.

Note

Although DataRobot creates a project for each segment, these projects are not available from the Project Control Center. Instead, they are investigated and managed from within the Combined Model, which is available in the Project Control Center.

DataRobot incorporates each segment champion to create the Combined Model—a main umbrella project that acts as a collection point for all the segments. The result is a “one-model" deployment—the segmented project—while each segment has its own model running in the deployment behind the scenes.

It is important to remember that while segmented modeling solves some problems (model factories, multiple deployments), it cannot know which segments you care most about or which have the highest ROI. To be successful, you must correctly define the use case, set up the dataset, and define the segments.

See the segmented modeling FAQ for more detailed information. See the visual overview for a quick representation of why to use segmented modeling.

Modeling with segmentation is available for multiseries regression projects.

Segmented modeling workflow

Time series segmented modeling requires, first, defining segments that divide the dataset. To define segments, you can allow DataRobot to:

  • Discover clusters in your data and then use those clusters as segments.
  • Assign segments for you based on the configured segment ID.

To build a segmented modeling project:

  1. Follow the standard time series workflow—set the target and turn on time-aware modeling. Choose Automated time series forecasting as the modeling method.

  2. Enable multiseries modeling by setting the series identifier.

  3. Set the segmentation method by clicking the pencil:

  4. Set whether to enable segmented modeling:

    • Select Yes, build models per segment to enable segmented modeling. When selected, you must also set how segments are defined.
    • Select No, build models without segmenting to return to the previous Time Series Forecasting window. If you choose not to do segmented modeling, DataRobot builds one model for all detected series (regular multiseries).
  5. Set how segments are defined. The table below describes each option:

    Option Description
    ID column Select a column from the training dataset that DataRobot will use as the segment ID. Start to type a column name and see matching auto-complete selections or select from the identifiers that DataRobot identified. Segment ID must be different than series ID (see note below).
    Existing clustering model Use a clustering model previously saved to the Model Registry.
    New clustering model Start a new clustering project, with results later applied via the Existing clustering model option, by clicking time series clustering link in the help text.
    What if I want to have one series per segment?

    The columns specified for segment ID and series ID cannot be the same; however, you can duplicate the series ID column and give it a new name. Then, set the segment ID to the new column name (using the How are the segments defined section). DataRobot will generate the segments using the series ID.

  6. Once the method is selected—either the ID is set or an existing clustering model is selected—click Set segmentation method. The Time Series Forecasting window returns, where you can then continue the configuration—training windows, duration, KA, and calendar selection—including changing the selected series and segment.

    How are the training periods determined if clustering was used?

    When building a segmented model using found clusters to split the dataset into the child projects (segments), DataRobot applies the training window settings from the clustering project to the segmented modeling project. This protects the holdout in segmented modeling and prevents data leakage from the clustering model when splitting the segmented dataset into child projects. Using the start and end dates of each series, the general scenarios that affect the methodology:

    • If the series data contains the time window needed (as defined in the clustering project), DataRobot simply passes the series data along.
    • Series data before the clustering training end: If there is a series that is shorter than the full training window and extends past holdout, DataRobot only uses data points before the clustering end that is the size of the training duration (only the portions that exist within the training boundary).
    • Series data has only data older in time than the clustering training end: If there is a legacy series in which its data does not fall into the training window, DataRobot "slides back" and gathers data for the duration of the training window agains collect so it can be used in segmented and not lost
    • Series data only exists "newer" in time than the clustering training end: If a series only exists in holdout, DataRobot slides the window forward but does not select any data that was used in training. In this way, the data is not dropped, but it is only used for examining the holdout of a child project.

  7. When the configuration is ready, select Quick or full Autopilot, or Manual mode, and click Start. DataRobot prompts to remind you that because it builds a complete project for each segment, the time required to finish modeling could be quite long. Confirm you want to proceed by clicking Start modeling. (You can set DataRobot to proceed without approval for future segmented projects.)

  8. After EDA2 completes, DataRobot immediately creates the Combined Model. Because the "child" models (the independent segment models) are still building, the Combined Model is not complete. However, you can control building and worker assignment from the Combined Model.

  9. When modeling completes, use the Combined Model to explore segments.

Explore results

Once modeling has finished, the Model tab indicates that one model has built. (See the note regarding outcome when using Manual mode.) This is the completed Combined Model.

The charts and graphs available for segmented modeling are dependent on the model type:

  • For the Combined Model, you can access the Segmentation, a model blueprint, the modeling log, Make Predictions, and Comments.

  • For the models available in the individual segments, the visualizations and modeling tabs (Repository, Compare Models, etc.) appropriate to a multiseries regression project are available.

Segmentation tab

Click to expand the Combined Model and expose the Segmentation tab.

The following table describes components of the Segmentation tab:

Component Description
Search Use search to change the display so that it only includes segments that match the entered string.
Download CSV Download a spreadsheet containing the metadata associated with the Combined Model project, including metric scores, champion history, IDs, and project history.
Segment Lists the segment values, found in the training data by DataRobot, in the specified segment ID.
Rows Displays segment statistics from the training data—the raw number of rows and the percentage of the dataset that those rows represent.
Total models Indicates the number of models DataRobot built for that segment during the build process.
Champion last updated Indicates the time and the responsible party for the last segment champion assignment. The entry also provides an icon indicating the champion model type. Initially, all rows will list by DataRobot. Segments are listed by the "All backtests" scores; click the column header to re-sort.
Backtest 1 Indicates the champion model's Backtest 1 score for the selected metric.
All backtests Indicates the average score for all backtests run for the champion model.
Holdout Provides an icon that indicates whether Holdout has been unlocked.

Explore segments

The Combined Model is comprised of one model per segment—the segment champion. The individual segments, on the other hand, comprise a complete project. You can investigate the project from the segment's Leaderboard and even deploy a segment model, independent of the Combined Model.

Access a segment's Leaderboard

There are multiple ways to access a segment's Leaderboard.

From the Combined Model

Expand the Combined Model and click the segment name in the Segmentation tab list.

Once clicked, the segment's Leaderboard opens. Notice that:

Indicator Description
A full set of models has been built.
DataRobot has recommended a model for deployment and marked a model as champion.
Regular Worker Queue controls are available.

From the Segment dropdown

Use the Segment dropdown to change your view.

  • From a segment:
    • Select an alternate segment. The segment's Leaderboard displays.
    • Select View all segments to return to the Combined Model.
  • From the Combined Model, select a segment to open the segment's Leaderboard.

Reassign the champion model

While DataRobot initially assigns a segment champion, you may want to change the designation. This could be the case, for example, if it were important to you that all segments provide the same model type to the Combined Model. Identify the segment champion from a segment's Leaderboard, where it is marked with the champion badge:

To reassign the champion, from the segment Leaderboard, select the model you want as champion. Then, from the menu select Leaderboard options > Mark model as champion.

The badge moves to the new model:

And the Combined Model's Segmentation tab shows when the champion was last updated and who assigned the new champion.

Control across projects

Because DataRobot treats each segment as an individual project, completing the Combined Model can take significantly longer than a regular multiseries project. The exact time is dependent on the number of segments and size of your dataset. You can use the controls described below to set workers (1) and to stop and start modeling (2). All actions are performed from the Worker Queue of the Combined Model and apply to all segment projects. You can also use it to unlock Holdout (3).

Worker control

From the Combined Model, you can control the number of modeling workers across all segment projects. DataRobot automatically re-balances workers between segments, distributing available workers between running segments as each segment completes modeling. When changing the worker count, DataRobot ignores any projects not in the modeling stage.

Pause/Start/Stop child modeling

From the Worker Queue of the parent segmented project, you can control modeling actions of the child projects. Use the stop/start/cancel buttons in the sidebar, and the selected action is applied to all child projects simultaneously. Specifically:

  • At the start of a segmented project, no queue actions are available.

  • When all segments have reached the EDA2 stage, the Pause and Start buttons become available.

  • The Cancel button becomes available when child projects are in the modeling stage and have at least one job running.

Unlock Holdout

You can unlock Holdout for an entire project or for each segment.

  • To unlock the entire project—all models in all segments—choose Unlock Holdout from the Combined Model's Worker Queue.

  • To unlock Holdout for all models in a segment, (open the segment's Leaderboard) and choose Unlock project Holdout for all models in the Worker Queue.

Leaderboard model scores

When looking at the Combined Model on the Leaderboard, you may notice that no scores display:

This is because the default metric for a regression project is RMSE. If you change the metric to a supported metric, for example MAE, an aggregated score from the champion models becomes available. Supported metrics are MAD, MAE, MAPE, MASE, RMSE, RMSLE, SMAPE, and Theil’s U.

Scores for the Combined Model, which are based on the champion scores or, if champion(s) are prepared for deployment models, their parents’ scores. To see the individual champion scores, expand the Combined Model to display the Segmentation tab.

Why no champion score?

When DataRobot selects a champion model, that model has been prepared for deployment. As part of the preparation, the model is retrained into Holdout (retrained as a start/end model into the most recent data). The parent of the champion/recommended model is the model the champion is trained from. So looking at the parent provides scores for the champion.

Scores are reported there. If you change the champion, DataRobot passes the scores from the new champion (or its parent) to the Combined Model. Note that:

  • An asterisk next to a score indicates that the score reflects that of the parent model.

  • An N/A in the score column indicates that backtests have not been run. Open the model on the segment's Leaderboard and run "All backtests."

Manual mode in segmented modeling

When using Manual mode with segmented modeling, DataRobot creates individual projects per segment and completes preparation as far as the modeling stage. However, DataRobot does not create per-project models. It does create the Combined Model (as a placeholder), but does not select a champion. Using Manual mode is a technique you can use to have full manual control over which models are trained in each segment and selected as champions, without taking the time to build the models.

Deploy a combined model

Availability information

Time series segmented modeling deployments do not support data drift monitoring or Prediction Explanations.

To fully leverage the value of segmented modeling, you can deploy combined models like any other time series model. After selecting the champion model for each included project, you can deploy the combined model to create a "one-model" deployment for multiple segments; however, the individual segments in the deployed combined model still have their own segment champion models running in the deployment behind the scenes. Creating a deployment allows you to use DataRobot MLOps for accuracy monitoring, prediction intervals, challenger models, and retraining.

Combined model deployment challengers and retraining
  • Retraining for time series segmented modeling deployments only supports Autopilot retraining (full or quick).

  • Retraining can be triggered by accuracy drift in a combined model; however, it doesn't support monitoring accuracy in individual segments or retraining individual segments.

  • Combined model deployments can include standard model challengers.

When segmented modeling completes, you can deploy the resulting combined model:

  1. Once Autopilot has finished, the Model tab contains one model. This model is the completed combined model.

  2. Click the Combined Model, and then click Predict > Deploy.

  3. On the Deploy tab, click Deploy model.

    Note

    You can also click Add to Model Registry and then deploy the combined model from there.

  4. Add deployment information and create the deployment.

  5. Monitor, manage, and govern the deployed model in DataRobot MLOps.

Modify and clone a deployed combined model

After deploying a combined model, you can change the segment champion for a segment by cloning the deployed combined model and modifying the cloned model. This process is automatic and occurs when you attempt to change a segment's champion within a deployed combined model. The cloned model you can modify becomes the Active Combined Model. This process ensures stability in the deployed model while allowing you to test changes within the same segmented project.

Note

Only one combined model on a project's Leaderboard can be the Active Combined Model (marked with a badge).

To modify and clone a deployed combined model, take the following steps:

  1. Once a combined model is deployed, it is labeled Prediction API Enabled.

  2. Click the active and deployed Combined Model, and then in the Segments tab, click the segment you want to modify.

  3. Reassign the segment champion.

  4. In the dialog box that appears, click Yes, create new combined model.

  5. On the project's Leaderboard, you can access and modify the Active Combined Model.

    Tip

    For a short time, in the Combined Model updated notification, you can click Go to Combined Model to return to the segment's combined models in the Leaderboard.


Updated September 27, 2022
Back to top