Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Perform segmented modeling

After clustering your data, you can use segmented modeling to proceed with the demand forecasting workflow. Complex and accurate demand forecasting typically requires deep statistical know-how and lengthy development projects around big data architectures. DataRobot's segmented modeling automates this requirement by creating multiple projects—"under the hood." Once the segments are identified and built, they are merged to make a single object—the Combined Model. This leads to improved model performance and decreased time to deployment.

Configuration

from datarobot import SegmentationTask
from datarobot import DatetimePartitioningSpecification
from datarobot import enums
from datarobot import Project
from datarobot import CombinedModel
SPLIT_COL = 'Cluster'

PROJECT_NAME = 'ACME'

VERSION = '1'
MODE    = 'Q'

# CLUSTERS = [0,1]
CLUSTERS = data.Cluster.unique().tolist()
CLUSTERS.sort()


FDWS = [(-35,0)]  
FDS  = [(1,7)]  

BASE   = PROJECT_NAME + '_V:'

PREFIX = BASE + VERSION + '_Mnths:' + MONTHS + '_Mode:' + MODE
DATASET_FILENAME = 'Months_' + MONTHS
MAX_WAIT = 14400
READ_TIMEOUT = 14400


# Defaults
HOLDOUT_START_DATE  = None # pd.to_datetime('2021-03-29')
VALIDATION_DURATION = None # dr.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=28) 
HOLDOUT_DURATION    = None # dr.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=28)   
NUMBER_BACKTESTS    = None # 1 
GAP_DURATION        = dr.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=7) 


# Known In Advance columns
# Creates lags, AND uses the current value at prediction time
KIA_VARS = ['Advertised', 'FrontPg', 'OnDisplay', 'WeekNbr']

# Do Not Dervive columns
# Only creates 1 lag
# DND_VARS = []

# KIA & Do Not Derive Columns
# Only uses the actual value at prediction time, NO lags at all
KIA_DND = ['Cat', 'Store', 'Month']


FEATURE_SETTINGS = []

for column in KIA_VARS:
    FEATURE_SETTINGS.append(dr.FeatureSettings(column, known_in_advance=True, do_not_derive=False))

# for column in DND_VARS:
#     FEATURE_SETTINGS.append(dr.FeatureSettings(column, known_in_advance=False, do_not_derive=True))

for column in KIA_DND:
    FEATURE_SETTINGS.append(dr.FeatureSettings(column, known_in_advance=True, do_not_derive=True))


# Add Calendar file
CAL = dr.CalendarFile.create('Acme_Calendar.csv',
                              calendar_name = 'US_Calendar')

CAL_ID = CAL.id

print(FEATURE_SETTINGS)
print(' ')
print(CAL_ID)
# Take sample record to get KIA and DND variables for above
data.head(1)
SEGMENT_ID = 'Cluster'

Create a project

project = Project.create('Acme_Train.csv', 
                         project_name = 'Acme Segmented Modeling Demo',
                         dataset_filename = 'Acme_Train.csv'
                        )

Set the date/time partitioning specification

time_partition = dr.DatetimePartitioningSpecification(
    datetime_partition_column = DATE_COL,
    forecast_window_start     = FDS[0][0], 
    forecast_window_end       = FDS[0][1],
    feature_derivation_window_start = FDWS[0][0],
    feature_derivation_window_end   = FDWS[0][1],
    disable_holdout           = False,
    holdout_start_date        = HOLDOUT_START_DATE ,
    validation_duration       = VALIDATION_DURATION,  
    holdout_duration          = HOLDOUT_DURATION,
    number_of_backtests       = NUMBER_BACKTESTS, 
    feature_settings          = FEATURE_SETTINGS,
    use_time_series           = True,
    multiseries_id_columns    = [SERIES_ID],
    calendar_id               = CAL_ID,
    windows_basis_unit        = None, # Row Based = 'ROW' ; Time Based = None
    model_splits              = 5,
    allow_partial_history_time_series_predictions = False
  )

Create a segmentation task

A segmentation task is an entity that defines how the input dataset is partitioned. Currently, only user-defined segmentation is supported. That is, the dataset must have a separate column that is used to identify segment (and the user must select it). All records within a series must have the same segment identifier.

segmentation_task_results = SegmentationTask.create(
    project_id = project.id,
    target = TARGET,
    use_time_series = True,
    datetime_partition_column = DATE_COL,
    multiseries_id_columns = [SERIES_ID],
    user_defined_segment_id_columns = [SEGMENT_ID],
)

segmentation_task = segmentation_task_results['completedJobs'][0]

Initiate modeling

project.set_target(
    target = TARGET,
    partitioning_method = time_partition,
    mode = enums.AUTOPILOT_MODE.QUICK,
    worker_count = -1,
    segmentation_task_id = segmentation_task.id,
)

Retrieve the combined model

A combined model in a segmented modeling project can be thought of as a meta-model; a model made of references to the best model within each segment. Although they are quite different from a standard DataRobot model in its creation, their use is very much the same after the model is complete (for example, deploying or making predictions).

The following examples illustrate how to set up, run, and manage a segmented modeling project using the Python public API client. For details, refer to Segmented Modeling API Reference.

combined_models = project.get_combined_models()
current_combined_model = combined_models[0]

Use the snippet below to get information about each segment (the child projects of the combined model).

segments_info = current_combined_model.get_segments_info()
display(segments_info)

# Get information about the segments as a DataFrame
# segments_df = current_combined_model.get_segments_as_dataframe()
# display(segments_df)

# Get list of all models associated with segments from the project
# We can decide whether keys should be `segment names` or `project_ids`
segments_and_child_models = project.get_segments_models(current_combined_model.id)
display(segments_and_child_models)

Set a segment champion

You can set new champions for segments using the code below.

new_champions = {}
for child_project_model_data in segments_and_child_models:
    # We can retrieve information about each child project
    child_project_id = child_project_model_data['project_id']
    parent_project_id = child_project_model_data['parent_project_id']
    combined_model_id = child_project_model_data['combined_model_id']

    # And retrieve its models
    child_project_models = child_project_model_data['models']

    # Let's pick the third best model
    new_champion = child_project_models[2]

    # Now we can set the new champions by using the `project_id` of the child project
    # and its associated model ID
    CombinedModel.set_segment_champion(project_id=child_project_id, model_id=new_champion.id)

Next steps

After setting your segment champions, you can generate model insights.


Updated May 2, 2022
Back to top