Create a multiseries project¶
This notebook outlines how to create a DataRobot project and begin modeling for a multiseries time series project with DataRobot's REST API.
要件¶
- DataRobotでは、Pythonバージョン3.7以降を推奨しています。ただし、このワークフローは以前のバージョンとも互換性があります。
- DataRobot APIバージョン2.28.0
使用しているPythonバージョンとDataRobot APIバージョンによっては、若干の調整が必要になる場合があります。
This notebook does not include a dataset and references several unknown columns; however, you can extrapolate to use your own series identifier.
You can also reference documentation for the DataRobot REST API.
ライブラリのインポート¶
import datetime
import json
import time
from pandas.io.json import json_normalize
import requests
import yaml
資格情報の設定¶
FILE_CREDENTIALS = "path-to-drconfig.yaml"
parsed_file = yaml.load(open(FILE_CREDENTIALS), Loader=yaml.FullLoader)
DR_ENDPOINT = parsed_file["endpoint"]
API_TOKEN = parsed_file["token"]
AUTH_HEADERS = {"Authorization": "token %s" % API_TOKEN}
Define functions¶
The functions below handle responses, including asynchronous calls.
def wait_for_async_resolution(status_url):
status = False
while status == False:
resp = requests.get(status_url, headers=AUTH_HEADERS)
r = json.loads(resp.content)
try:
statusjob = r["status"].upper()
except:
statusjob = ""
if resp.status_code == 200 and statusjob != "RUNNING" and statusjob != "INITIALIZED":
status = True
print("Finished: " + str(datetime.datetime.now()))
return resp
print("Waiting: " + str(datetime.datetime.now()))
time.sleep(10) # Delays for 10 seconds.
def wait_for_result(response):
assert response.status_code in (200, 201, 202), response.content
if response.status_code == 200:
data = response.json()
elif response.status_code == 201:
status_url = response.headers["Location"]
resp = requests.get(status_url, headers=AUTH_HEADERS)
assert resp.status_code == 200, resp.content
data = resp.json()
elif response.status_code == 202:
status_url = response.headers["Location"]
resp = wait_for_async_resolution(status_url)
data = resp.json()
return data
Create the project¶
Endpoint: POST /api/v2/projects/
FILE_DATASET = "/Volumes/GoogleDrive/My Drive/Datasets/Store Sales/STORE_SALES-TRAIN-2022-04-25.csv"
payload = {
# 'projectName': 'TestRESTTimeSeries_1',
"file": ("Test_REST_TimeSeries_12", open(FILE_DATASET, "r"))
}
response = requests.post(
"%s/projects/" % (DR_ENDPOINT), headers=AUTH_HEADERS, files=payload, timeout=180
)
response
<Response [202]>
# Wait for async task to complete
print("Uploading dataset and creating Project...")
projectCreation_response = wait_for_result(response)
project_id = projectCreation_response["id"]
print("\nProject ID: " + project_id)
Uploading dataset and creating Project... Waiting: 2022-07-29 17:55:32.507696 Waiting: 2022-07-29 17:55:43.092965 Waiting: 2022-07-29 17:55:53.670669 Waiting: 2022-07-29 17:56:04.252294 Waiting: 2022-07-29 17:56:14.841809 Finished: 2022-07-29 17:56:25.650896 Project ID: 62e402f1ce8ba47b224fcea3
Update the project¶
Endpoint: PATCH /api/v2/projects/(projectId)/
payload = {"workerCount": 16}
response = requests.patch(
"%s/projects/%s/" % (DR_ENDPOINT, project_id), headers=AUTH_HEADERS, json=payload, timeout=180
)
response
<Response [200]>
Run a detection job¶
For a multiseries project, you must run a detection job to analyze the relationship between the partition and multiseries ID columns.
Endpoint: POST /api/v2/projects/(projectId)/multiseriesProperties/
payload = {"datetimePartitionColumn": "Date", "multiseriesIdColumns": ["Store"]}
response = requests.post(
"%s/projects/%s/multiseriesProperties/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json=payload,
timeout=180,
)
response
<Response [202]>
print("Analyzing multiseries partitions...")
multiseries_response = wait_for_result(response)
Analyzing multiseries partitions... Waiting: 2022-07-29 17:56:27.571064 Waiting: 2022-07-29 17:56:38.156104 Finished: 2022-07-29 17:56:48.932686
Initiate modeling¶
Endpoint: PATCH /api/v2/projects/(projectId)/aim/
payload = {
"target": "Sales",
"mode": "quick",
"datetimePartitionColumn": "Date",
"featureDerivationWindowStart": -25,
"featureDerivationWindowEnd": 0,
"forecastWindowStart": 1,
"forecastWindowEnd": 12,
"numberOfBacktests": 2,
"useTimeSeries": True,
"cvMethod": "datetime",
"multiseriesIdColumns": ["Store"],
"blendBestModels": False,
}
response = requests.patch(
"%s/projects/%s/aim/" % (DR_ENDPOINT, project_id),
headers=AUTH_HEADERS,
json=payload,
timeout=180,
)
response
<Response [202]>
print("Waiting for tasks previous to training to complete...")
autopilot_response = wait_for_result(response)
Waiting for tasks previous to training to complete... Waiting: 2022-07-29 17:56:51.024036 Waiting: 2022-07-29 17:57:01.746376 Waiting: 2022-07-29 17:57:12.329879 Waiting: 2022-07-29 17:57:22.904449 Waiting: 2022-07-29 17:57:33.679282 Waiting: 2022-07-29 17:57:44.262096 Waiting: 2022-07-29 17:57:54.845494 Waiting: 2022-07-29 17:58:05.427372 Waiting: 2022-07-29 17:58:15.995107 Waiting: 2022-07-29 17:58:26.605621 Waiting: 2022-07-29 17:58:37.188681 Waiting: 2022-07-29 17:58:47.762809 Waiting: 2022-07-29 17:58:58.348806 Waiting: 2022-07-29 17:59:08.925445 Waiting: 2022-07-29 17:59:19.505174 Waiting: 2022-07-29 17:59:30.093026 Waiting: 2022-07-29 17:59:40.670835 Waiting: 2022-07-29 17:59:51.239278 Waiting: 2022-07-29 18:00:01.818356 Waiting: 2022-07-29 18:00:12.395658 Waiting: 2022-07-29 18:00:22.993393 Waiting: 2022-07-29 18:00:33.576738 Waiting: 2022-07-29 18:00:44.166028 Waiting: 2022-07-29 18:00:54.768693 Waiting: 2022-07-29 18:01:05.372862 Waiting: 2022-07-29 18:01:15.981022 Waiting: 2022-07-29 18:01:26.571205 Waiting: 2022-07-29 18:01:37.160074 Waiting: 2022-07-29 18:01:47.741388 Waiting: 2022-07-29 18:01:58.326862 Waiting: 2022-07-29 18:02:08.912622 Finished: 2022-07-29 18:02:19.739789