Segmented modeling FAQ¶
What is an example of a segment vs. a series?
Imagine you sell avocados. Your target is "avocado_sales" and the series ID is “stores selling avocados.” The segment ID is “Region of the country." Think of a segment as a group of series. Let's look at the Northwest segment. Because avocado sales in Alaskan stores don't resemble California stores’ sales, assigning and building around a segment ID is like building a business rule cluster. Instead of predicting avocado sales, you are predicting avocado sales in the Northwest region. See also the visual quickstart.
Is DataRobot building one model per series?
DataRobot builds multiple models per segment (a group of series); every segment has its own Leaderboard. DataRobot then selects and prepares a champion model from each segment Leaderboard.
Does DataRobot pick the segment champion?
DataRobot recommends one model from the segment Leaderboard, prepares it for deployment, and marks it as segment champion. That model then represents the segment in the Combined Model. You can, however, reset the champion to any model on the segment's Leaderboard.
What are your dataset file size constraints?
Regular time series dataset file size constraints apply. Segmented modeling supports up to 100 segments, but those segment sizes cannot exceed the total training set size limitation. However, it is important to monitor how many segments you want to create because each segment is effectively its own Autopilot. In other words, if you aren't prepared to run 100 instances of Autopilot, don’t start a segmented modeling project with 100 segments.
Do segmented projects use the same feature engineering?
The internal time series feature engineering process makes different features for each segmented project, based on what it finds useful in that segment. There is likely some overlap between segments, but the full list of generated features per segment will differ.
Where do I set the Forecast Window and Feature Derivation Window?
The flow is the same as setting a series ID, except that now you set a segment ID before configuring windows. You also have the ability to go back and edit your segment ID if you need to.
Can the same column be used for the series ID and the segment ID?
No they must use different columns. If you want to have one series per segment (a single-series segmented project), duplicate the series ID column, giving it new name. Set the segment ID to that column name. DataRobot will generate the segments using the series ID.
If you don't want to create a new column, you can often work within the original data to extrapolate. For example, if you previously set
customer_unique_id as your series ID to predict sales for different product IDs, try using
customer_unique_id as your segment ID and use
product_id as your series ID.
What are some ways to think about creating segments?
Here are some favorites—segment by:
- Customer size
- SKUs, grouped by sales velocity
- Areas, by temperature
- Series by size (small, medium, large)
- Target distribution
What kind of partitioning does segmented modeling use?
Segmented modeling uses automated partitioning based on the size of the data, running different partitioning for each project. This ensures that the backtests are not too long or are not so short that there is no data in them.
How are segmented models treated as a deployment?
The Combined Model created with segmented modeling is treated as one deployment.
How are metric scores computed?
DataRobot runs Autopilot (full or Quick) on each segment independently, providing better accuracy. When modeling is complete for all child projects, metrics become available for the Combined Model and are displayed on the Leaderboard. The metrics for each champion model are aggregated as a weighted sum of their metrics. When you change a champion model, scores are recomputed. Available metrics are MAD, MAE, MAPE, MASE, RMSE, RMSLE, SMAPE, and Theil’s U.