Forecast sales with multiseries modeling¶
The use case provided in this notebook forecasts future sales for multiple stores using multiseries modeling. Multiseries modeling allows you to model datasets that contain multiple time series based on a common set of input features. In other words, a dataset that could be thought of as consisting of multiple individual time-series datasets with one column of labels indicating which series each row belongs to. This column is known as the series ID column.
Multiseries is useful for large chain businesses that want to create a forecast to correctly order inventory and staff stores with the needed number of people for the predicted store volume. An analyst managing the stores uses DataRobot to build time series models that predict daily sales.
Import libraries¶
import datetime as dt
from datetime import datetime
from importlib import reload
import os
import re
import datarobot as dr
from datarobot import Deployment, Project
import dateutil.parser
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
Connect to DataRobot¶
Read more about different options for connecting to DataRobot from the client.
# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call
# dr.Client(config_path='path-to-drconfig.yaml')
data_path = "https://docs.datarobot.com/en/docs/api/guide/common-case/DR_Demo_Sales_Multiseries_training.csv"
df = pd.read_csv(data_path, infer_datetime_format=True, parse_dates=["Date"], engine="c")
df.head(5)
Store | Date | Sales | Store_Size | Num_Employees | Num_Customers | Returns_Pct | Pct_On_Sale | Pct_Promotional | Marketing | TouristEvent | Econ_ChangeGDP | EconJobsChange | AnnualizedCPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Louisville | 2012-07-01 | 109673 | 20100 | 68 | 531 | 1.03 | 9.96 | 0.000047 | July In Store Credit Card Signup Discount; In ... | No | 0.5 | NaN | 0.02 |
1 | Louisville | 2012-07-02 | 131791 | 20100 | 34 | 476 | 0.41 | 8.65 | 0.000047 | July In Store Credit Card Signup Discount; In ... | No | NaN | NaN | NaN |
2 | Louisville | 2012-07-03 | 134711 | 20100 | 42 | 578 | 0.31 | 8.96 | 0.000047 | July In Store Credit Card Signup Discount; In ... | No | NaN | NaN | NaN |
3 | Louisville | 2012-07-04 | 97640 | 20100 | 54 | 569 | 0.83 | 10.08 | 0.000047 | July In Store Credit Card Signup Discount; In ... | No | NaN | NaN | NaN |
4 | Louisville | 2012-07-05 | 129538 | 20100 | 62 | 486 | 0.51 | 9.80 | 0.000047 | July In Store Credit Card Signup Discount; ID5... | No | NaN | NaN | NaN |
Plot the sales of each store¶
df.pivot(index="Date", columns="Store", values="Sales").plot(figsize=(18, 8));