Create a multiseries project¶
This notebook outlines how to create a DataRobot project and begin modeling for a multiseries time series project with DataRobot's REST API.
Requirements¶
- DataRobot recommends Python version 3.7 or later. However, this workflow is compatible with earlier versions.
- DataRobot API version 2.28.0
Small adjustments may be required depending on the Python version and DataRobot API version you are using.
This notebook does not include a dataset and references several unknown columns; however, you can extrapolate to use your own series identifier.
You can also reference documentation for the DataRobot REST API.
Import libraries¶
import datetime
import json
import time
from pandas.io.json import json_normalize
import requests
import yaml
Set credentials¶
FILE_CREDENTIALS = "path-to-drconfig.yaml"
parsed_file = yaml.load(open(FILE_CREDENTIALS), Loader=yaml.FullLoader)
DR_ENDPOINT = parsed_file["endpoint"]
API_TOKEN = parsed_file["token"]
AUTH_HEADERS = {"Authorization": "token %s" % API_TOKEN}
Define functions¶
The functions below handle responses, including asynchronous calls.
def wait_for_async_resolution(status_url):
status = False
while status == False:
resp = requests.get(status_url, headers=AUTH_HEADERS)
r = json.loads(resp.content)
try:
statusjob = r["status"].upper()
except:
statusjob = ""
if resp.status_code == 200 and statusjob != "RUNNING" and statusjob != "INITIALIZED":
status = True
print("Finished: " + str(datetime.datetime.now()))
return resp
print("Waiting: " + str(datetime.datetime.now()))
time.sleep(10) # Delays for 10 seconds.
def wait_for_result(response):
assert response.status_code in (200, 201, 202), response.content
if response.status_code == 200:
data = response.json()
elif response.status_code == 201:
status_url = response.headers["Location"]
resp = requests.get(status_url, headers=AUTH_HEADERS)
assert resp.status_code == 200, resp.content
data = resp.json()
elif response.status_code == 202:
status_url = response.headers["Location"]
resp = wait_for_async_resolution(status_url)
data = resp.json()
return data
Create the project¶
Endpoint: POST /api/v2/projects/
FILE_DATASET = "/Volumes/GoogleDrive/My Drive/Datasets/Store Sales/STORE_SALES-TRAIN-2022-04-25.csv"
payload = {
# 'projectName': 'TestRESTTimeSeries_1',
"file": ("Test_REST_TimeSeries_12", open(FILE_DATASET, "r"))
}
response = requests.post(
"%s/projects/" % (DR_ENDPOINT), headers=AUTH_HEADERS, files=payload, timeout=180
)
response
<Response [202]>
# Wait for async task to complete
print("Uploading dataset and creating Project...")
projectCreation_response = wait_for_result(response)
project_id = projectCreation_response["id"]
print("\nProject ID: " + project_id)
Uploading dataset and creating Project... Waiting: 2022-07-29 17:55:32.507696 Waiting: 2022-07-29 17:55:43.092965 Waiting: 2022-07-29 17:55:53.670669 Waiting: 2022-07-29 17:56:04.252294 Waiting: 2022-07-29 17:56:14.841809 Finished: 2022-07-29 17:56:25.650896 Project ID: 62e402f1ce8ba47b224fcea3
Update the project¶
Endpoint: PATCH /api/v2/projects/(projectId)/
payload = {"workerCount": 16}
response = requests.patch(
"%s/projects/%s/" % (DR_ENDPOINT, project_id), headers=AUTH_HEADERS, json=payload, timeout=180
)
response
<Response [200]>
Run a detection job¶
For a multiseries project, you must run a detection job to analyze the relationship between the partition and multiseries ID columns.
Endpoint: POST /api/v2/projects/(projectId)/multiseriesProperties/
payload = {"datetimePartitionColumn": "Date", "multiseriesIdColumns": ["Store"]}
response = requests.post(
"%s/projects/%s/multiseriesProperties/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json=payload,
timeout=180,
)
response
<Response [202]>
print("Analyzing multiseries partitions...")
multiseries_response = wait_for_result(response)
Analyzing multiseries partitions... Waiting: 2022-07-29 17:56:27.571064 Waiting: 2022-07-29 17:56:38.156104 Finished: 2022-07-29 17:56:48.932686
Initiate modeling¶
Endpoint: PATCH /api/v2/projects/(projectId)/aim/
payload = {
"target": "Sales",
"mode": "quick",
"datetimePartitionColumn": "Date",
"featureDerivationWindowStart": -25,
"featureDerivationWindowEnd": 0,
"forecastWindowStart": 1,
"forecastWindowEnd": 12,
"numberOfBacktests": 2,
"useTimeSeries": True,
"cvMethod": "datetime",
"multiseriesIdColumns": ["Store"],
"blendBestModels": False,
}
response = requests.patch(
"%s/projects/%s/aim/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json=payload,
timeout=180,
)
response
<Response [202]>
print("Waiting for tasks previous to training to complete...")
autopilot_response = wait_for_result(response)
Waiting for tasks previous to training to complete... Waiting: 2022-07-29 17:56:51.024036 Waiting: 2022-07-29 17:57:01.746376 Waiting: 2022-07-29 17:57:12.329879 Waiting: 2022-07-29 17:57:22.904449 Waiting: 2022-07-29 17:57:33.679282 Waiting: 2022-07-29 17:57:44.262096 Waiting: 2022-07-29 17:57:54.845494 Waiting: 2022-07-29 17:58:05.427372 Waiting: 2022-07-29 17:58:15.995107 Waiting: 2022-07-29 17:58:26.605621 Waiting: 2022-07-29 17:58:37.188681 Waiting: 2022-07-29 17:58:47.762809 Waiting: 2022-07-29 17:58:58.348806 Waiting: 2022-07-29 17:59:08.925445 Waiting: 2022-07-29 17:59:19.505174 Waiting: 2022-07-29 17:59:30.093026 Waiting: 2022-07-29 17:59:40.670835 Waiting: 2022-07-29 17:59:51.239278 Waiting: 2022-07-29 18:00:01.818356 Waiting: 2022-07-29 18:00:12.395658 Waiting: 2022-07-29 18:00:22.993393 Waiting: 2022-07-29 18:00:33.576738 Waiting: 2022-07-29 18:00:44.166028 Waiting: 2022-07-29 18:00:54.768693 Waiting: 2022-07-29 18:01:05.372862 Waiting: 2022-07-29 18:01:15.981022 Waiting: 2022-07-29 18:01:26.571205 Waiting: 2022-07-29 18:01:37.160074 Waiting: 2022-07-29 18:01:47.741388 Waiting: 2022-07-29 18:01:58.326862 Waiting: 2022-07-29 18:02:08.912622 Finished: 2022-07-29 18:02:19.739789