Create a multiseries project¶
This notebook outlines how to create a DataRobot project and begin modeling for a multiseries time series project with DataRobot's REST API.
Requirements¶
- DataRobot recommends Python version 3.7 or later. However, this workflow is compatible with earlier versions.
- DataRobot API version 2.28.0
Small adjustments may be required depending on the Python version and DataRobot API version you are using.
This notebook does not include a dataset and references several unknown columns; however, you can extrapolate to use your own series identifier.
You can also reference documentation for the DataRobot REST API.
Import libraries¶
import requests
import json
from pandas.io.json import json_normalize
import time, datetime
import yaml
Set credentials¶
FILE_CREDENTIALS = 'path-to-drconfig.yaml'
parsed_file = yaml.load(open(FILE_CREDENTIALS), Loader=yaml.FullLoader)
DR_ENDPOINT = parsed_file['endpoint']
API_TOKEN = parsed_file['token']
AUTH_HEADERS = {'Authorization': 'token %s' % API_TOKEN }
Define functions¶
The functions below handle responses, including asynchronous calls.
def wait_for_async_resolution(status_url):
status = False
while status == False:
resp = requests.get(status_url,
headers=AUTH_HEADERS)
r = json.loads(resp.content)
try:
statusjob = r['status'].upper()
except:
statusjob = ''
if resp.status_code == 200 and statusjob != 'RUNNING' and statusjob != 'INITIALIZED':
status = True
print ("Finished: " + str(datetime.datetime.now()))
return resp
print ("Waiting: " + str(datetime.datetime.now()))
time.sleep(10) # Delays for 10 seconds.
def wait_for_result(response):
assert response.status_code in (200, 201, 202), response.content
if response.status_code == 200:
data = response.json()
elif response.status_code == 201:
status_url = response.headers['Location']
resp = requests.get(status_url,
headers=AUTH_HEADERS)
assert resp.status_code == 200, resp.content
data = resp.json()
elif response.status_code == 202:
status_url = response.headers['Location']
resp = wait_for_async_resolution(status_url)
data = resp.json()
return data
Create the project¶
Endpoint: POST /api/v2/projects/
FILE_DATASET = "/Volumes/GoogleDrive/My Drive/Datasets/Store Sales/STORE_SALES-TRAIN-2022-04-25.csv"
payload = {
# 'projectName': 'TestRESTTimeSeries_1',
'file': ('Test_REST_TimeSeries_12', open(FILE_DATASET, 'r'))
}
response = requests.post("%s/projects/" % (DR_ENDPOINT),
headers=AUTH_HEADERS,
files = payload,
timeout=180)
response
<Response [202]>
# Wait for async task to complete
print("Uploading dataset and creating Project...")
projectCreation_response = wait_for_result(response)
project_id = projectCreation_response['id']
print ("\nProject ID: " + project_id)
Uploading dataset and creating Project... Waiting: 2022-07-29 17:55:32.507696 Waiting: 2022-07-29 17:55:43.092965 Waiting: 2022-07-29 17:55:53.670669 Waiting: 2022-07-29 17:56:04.252294 Waiting: 2022-07-29 17:56:14.841809 Finished: 2022-07-29 17:56:25.650896 Project ID: 62e402f1ce8ba47b224fcea3
Update the project¶
Endpoint: PATCH /api/v2/projects/(projectId)/
payload = {
'workerCount': 16
}
response = requests.patch("%s/projects/%s/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json = payload,
timeout=180)
response
<Response [200]>
Run a detection job¶
For a multiseries project, you must run a detection job to analyze the relationship between the partition and multiseries ID columns.
Endpoint: POST /api/v2/projects/(projectId)/multiseriesProperties/
payload = {
'datetimePartitionColumn': 'Date',
"multiseriesIdColumns": ['Store']
}
response = requests.post("%s/projects/%s/multiseriesProperties/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json = payload,
timeout=180)
response
<Response [202]>
print("Analyzing multiseries partitions...")
multiseries_response = wait_for_result(response)
Analyzing multiseries partitions... Waiting: 2022-07-29 17:56:27.571064 Waiting: 2022-07-29 17:56:38.156104 Finished: 2022-07-29 17:56:48.932686
Initiate modeling¶
Endpoint: PATCH /api/v2/projects/(projectId)/aim/
payload = {
"target": "Sales",
"mode": "quick",
"datetimePartitionColumn": "Date",
"featureDerivationWindowStart": -25,
"featureDerivationWindowEnd": 0,
"forecastWindowStart": 1,
"forecastWindowEnd": 12,
"numberOfBacktests": 2,
"useTimeSeries": True,
"cvMethod": "datetime",
"multiseriesIdColumns": ['Store'],
"blendBestModels": False
}
response = requests.patch("%s/projects/%s/aim/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json = payload,
timeout=180)
response
<Response [202]>
print ("Waiting for tasks previous to training to complete...")
autopilot_response = wait_for_result(response)
Waiting for tasks previous to training to complete... Waiting: 2022-07-29 17:56:51.024036 Waiting: 2022-07-29 17:57:01.746376 Waiting: 2022-07-29 17:57:12.329879 Waiting: 2022-07-29 17:57:22.904449 Waiting: 2022-07-29 17:57:33.679282 Waiting: 2022-07-29 17:57:44.262096 Waiting: 2022-07-29 17:57:54.845494 Waiting: 2022-07-29 17:58:05.427372 Waiting: 2022-07-29 17:58:15.995107 Waiting: 2022-07-29 17:58:26.605621 Waiting: 2022-07-29 17:58:37.188681 Waiting: 2022-07-29 17:58:47.762809 Waiting: 2022-07-29 17:58:58.348806 Waiting: 2022-07-29 17:59:08.925445 Waiting: 2022-07-29 17:59:19.505174 Waiting: 2022-07-29 17:59:30.093026 Waiting: 2022-07-29 17:59:40.670835 Waiting: 2022-07-29 17:59:51.239278 Waiting: 2022-07-29 18:00:01.818356 Waiting: 2022-07-29 18:00:12.395658 Waiting: 2022-07-29 18:00:22.993393 Waiting: 2022-07-29 18:00:33.576738 Waiting: 2022-07-29 18:00:44.166028 Waiting: 2022-07-29 18:00:54.768693 Waiting: 2022-07-29 18:01:05.372862 Waiting: 2022-07-29 18:01:15.981022 Waiting: 2022-07-29 18:01:26.571205 Waiting: 2022-07-29 18:01:37.160074 Waiting: 2022-07-29 18:01:47.741388 Waiting: 2022-07-29 18:01:58.326862 Waiting: 2022-07-29 18:02:08.912622 Finished: 2022-07-29 18:02:19.739789