開発者向けドキュメント > 開発者向け学習 > REST APIコードの例 > Create a multiseries project

Create a multiseries project¶

This notebook outlines how to create a DataRobot project and begin modeling for a multiseries time series project with DataRobot's REST API.

要件¶

DataRobotでは、Pythonバージョン3.7以降を推奨しています。ただし、このワークフローは以前のバージョンとも互換性があります。
DataRobot APIバージョン2.28.0

使用しているPythonバージョンとDataRobot APIバージョンによっては、若干の調整が必要になる場合があります。

This notebook does not include a dataset and references several unknown columns; however, you can extrapolate to use your own series identifier.

You can also reference documentation for the DataRobot REST API.

ライブラリのインポート¶

In [1]:

Copied!





import datetime
import json
import time

from pandas.io.json import json_normalize
import requests
import yaml
import datetime
import json
import time

from pandas.io.json import json_normalize
import requests
import yaml

資格情報の設定¶

In [2]:

Copied!

FILE_CREDENTIALS = "path-to-drconfig.yaml"

parsed_file = yaml.load(open(FILE_CREDENTIALS), Loader=yaml.FullLoader)

DR_ENDPOINT = parsed_file["endpoint"]
API_TOKEN = parsed_file["token"]
AUTH_HEADERS = {"Authorization": "token %s" % API_TOKEN}
FILE_CREDENTIALS = "path-to-drconfig.yaml"

parsed_file = yaml.load(open(FILE_CREDENTIALS), Loader=yaml.FullLoader)

DR_ENDPOINT = parsed_file["endpoint"]
API_TOKEN = parsed_file["token"]
AUTH_HEADERS = {"Authorization": "token %s" % API_TOKEN}

Define functions¶

The functions below handle responses, including asynchronous calls.

In [3]:

Copied!





def wait_for_async_resolution(status_url):
    status = False

    while status == False:
        resp = requests.get(status_url, headers=AUTH_HEADERS)
        r = json.loads(resp.content)

        try:
            statusjob = r["status"].upper()
        except:
            statusjob = ""

        if resp.status_code == 200 and statusjob != "RUNNING" and statusjob != "INITIALIZED":
            status = True
            print("Finished: " + str(datetime.datetime.now()))
            return resp

        print("Waiting: " + str(datetime.datetime.now()))
        time.sleep(10)  # Delays for 10 seconds.


def wait_for_result(response):
    assert response.status_code in (200, 201, 202), response.content

    if response.status_code == 200:
        data = response.json()

    elif response.status_code == 201:
        status_url = response.headers["Location"]
        resp = requests.get(status_url, headers=AUTH_HEADERS)
        assert resp.status_code == 200, resp.content
        data = resp.json()

    elif response.status_code == 202:
        status_url = response.headers["Location"]
        resp = wait_for_async_resolution(status_url)
        data = resp.json()

    return data
def wait_for_async_resolution(status_url):
    status = False

    while status == False:
        resp = requests.get(status_url, headers=AUTH_HEADERS)
        r = json.loads(resp.content)

        try:
            statusjob = r["status"].upper()
        except:
            statusjob = ""

        if resp.status_code == 200 and statusjob != "RUNNING" and statusjob != "INITIALIZED":
            status = True
            print("Finished: " + str(datetime.datetime.now()))
            return resp

        print("Waiting: " + str(datetime.datetime.now()))
        time.sleep(10)  # Delays for 10 seconds.


def wait_for_result(response):
    assert response.status_code in (200, 201, 202), response.content

    if response.status_code == 200:
        data = response.json()

    elif response.status_code == 201:
        status_url = response.headers["Location"]
        resp = requests.get(status_url, headers=AUTH_HEADERS)
        assert resp.status_code == 200, resp.content
        data = resp.json()

    elif response.status_code == 202:
        status_url = response.headers["Location"]
        resp = wait_for_async_resolution(status_url)
        data = resp.json()

    return data

Create the project¶

Endpoint: POST /api/v2/projects/

In [4]:

Copied!

FILE_DATASET = "/Volumes/GoogleDrive/My Drive/Datasets/Store Sales/STORE_SALES-TRAIN-2022-04-25.csv"
FILE_DATASET = "/Volumes/GoogleDrive/My Drive/Datasets/Store Sales/STORE_SALES-TRAIN-2022-04-25.csv"

In [5]:

Copied!





payload = {
    # 'projectName': 'TestRESTTimeSeries_1',
    "file": ("Test_REST_TimeSeries_12", open(FILE_DATASET, "r"))
}

response = requests.post(
    "%s/projects/" % (DR_ENDPOINT), headers=AUTH_HEADERS, files=payload, timeout=180
)

response
payload = {
    # 'projectName': 'TestRESTTimeSeries_1',
    "file": ("Test_REST_TimeSeries_12", open(FILE_DATASET, "r"))
}

response = requests.post(
    "%s/projects/" % (DR_ENDPOINT), headers=AUTH_HEADERS, files=payload, timeout=180
)

response

Out[5]:

<Response [202]>

In [6]:

Copied!

# Wait for async task to complete

print("Uploading dataset and creating Project...")

projectCreation_response = wait_for_result(response)

project_id = projectCreation_response["id"]
print("\nProject ID: " + project_id)
# Wait for async task to complete

print("Uploading dataset and creating Project...")

projectCreation_response = wait_for_result(response)

project_id = projectCreation_response["id"]
print("\nProject ID: " + project_id)

Uploading dataset and creating Project...
Waiting: 2022-07-29 17:55:32.507696
Waiting: 2022-07-29 17:55:43.092965
Waiting: 2022-07-29 17:55:53.670669
Waiting: 2022-07-29 17:56:04.252294
Waiting: 2022-07-29 17:56:14.841809
Finished: 2022-07-29 17:56:25.650896

Project ID: 62e402f1ce8ba47b224fcea3

Update the project¶

Endpoint: PATCH /api/v2/projects/(projectId)/

In [7]:

Copied!

payload = {"workerCount": 16}

response = requests.patch(
    "%s/projects/%s/" % (DR_ENDPOINT, project_id), headers=AUTH_HEADERS, json=payload, timeout=180
)

response
payload = {"workerCount": 16}

response = requests.patch(
    "%s/projects/%s/" % (DR_ENDPOINT, project_id), headers=AUTH_HEADERS, json=payload, timeout=180
)

response

Out[7]:

<Response [200]>

Run a detection job¶

For a multiseries project, you must run a detection job to analyze the relationship between the partition and multiseries ID columns.

Endpoint: POST /api/v2/projects/(projectId)/multiseriesProperties/

In [8]:

Copied!





payload = {"datetimePartitionColumn": "Date", "multiseriesIdColumns": ["Store"]}

response = requests.post(
    "%s/projects/%s/multiseriesProperties/" % (DR_ENDPOINT, project_id),
    headers=AUTH_HEADERS,
    json=payload,
    timeout=180,
)

response
payload = {"datetimePartitionColumn": "Date", "multiseriesIdColumns": ["Store"]}

response = requests.post(
    "%s/projects/%s/multiseriesProperties/" % (DR_ENDPOINT, project_id),
    headers=AUTH_HEADERS,
    json=payload,
    timeout=180,
)

response

Out[8]:

<Response [202]>

In [9]:

Copied!

print("Analyzing multiseries partitions...")

multiseries_response = wait_for_result(response)
print("Analyzing multiseries partitions...")

multiseries_response = wait_for_result(response)

Analyzing multiseries partitions...
Waiting: 2022-07-29 17:56:27.571064
Waiting: 2022-07-29 17:56:38.156104
Finished: 2022-07-29 17:56:48.932686

Initiate modeling¶

Endpoint: PATCH /api/v2/projects/(projectId)/aim/

In [10]:

Copied!





payload = {
    "target": "Sales",
    "mode": "quick",
    "datetimePartitionColumn": "Date",
    "featureDerivationWindowStart": -25,
    "featureDerivationWindowEnd": 0,
    "forecastWindowStart": 1,
    "forecastWindowEnd": 12,
    "numberOfBacktests": 2,
    "useTimeSeries": True,
    "cvMethod": "datetime",
    "multiseriesIdColumns": ["Store"],
    "blendBestModels": False,
}

response = requests.patch(
    "%s/projects/%s/aim/" % (DR_ENDPOINT, project_id),
    headers=AUTH_HEADERS,
    json=payload,
    timeout=180,
)

response
payload = {
    "target": "Sales",
    "mode": "quick",
    "datetimePartitionColumn": "Date",
    "featureDerivationWindowStart": -25,
    "featureDerivationWindowEnd": 0,
    "forecastWindowStart": 1,
    "forecastWindowEnd": 12,
    "numberOfBacktests": 2,
    "useTimeSeries": True,
    "cvMethod": "datetime",
    "multiseriesIdColumns": ["Store"],
    "blendBestModels": False,
}

response = requests.patch(
    "%s/projects/%s/aim/" % (DR_ENDPOINT, project_id),
    headers=AUTH_HEADERS,
    json=payload,
    timeout=180,
)

response

Out[10]:

<Response [202]>

In [11]:

Copied!

print("Waiting for tasks previous to training to complete...")

autopilot_response = wait_for_result(response)
print("Waiting for tasks previous to training to complete...")

autopilot_response = wait_for_result(response)

Waiting for tasks previous to training to complete...
Waiting: 2022-07-29 17:56:51.024036
Waiting: 2022-07-29 17:57:01.746376
Waiting: 2022-07-29 17:57:12.329879
Waiting: 2022-07-29 17:57:22.904449
Waiting: 2022-07-29 17:57:33.679282
Waiting: 2022-07-29 17:57:44.262096
Waiting: 2022-07-29 17:57:54.845494
Waiting: 2022-07-29 17:58:05.427372
Waiting: 2022-07-29 17:58:15.995107
Waiting: 2022-07-29 17:58:26.605621
Waiting: 2022-07-29 17:58:37.188681
Waiting: 2022-07-29 17:58:47.762809
Waiting: 2022-07-29 17:58:58.348806
Waiting: 2022-07-29 17:59:08.925445
Waiting: 2022-07-29 17:59:19.505174
Waiting: 2022-07-29 17:59:30.093026
Waiting: 2022-07-29 17:59:40.670835
Waiting: 2022-07-29 17:59:51.239278
Waiting: 2022-07-29 18:00:01.818356
Waiting: 2022-07-29 18:00:12.395658
Waiting: 2022-07-29 18:00:22.993393
Waiting: 2022-07-29 18:00:33.576738
Waiting: 2022-07-29 18:00:44.166028
Waiting: 2022-07-29 18:00:54.768693
Waiting: 2022-07-29 18:01:05.372862
Waiting: 2022-07-29 18:01:15.981022
Waiting: 2022-07-29 18:01:26.571205
Waiting: 2022-07-29 18:01:37.160074
Waiting: 2022-07-29 18:01:47.741388
Waiting: 2022-07-29 18:01:58.326862
Waiting: 2022-07-29 18:02:08.912622
Finished: 2022-07-29 18:02:19.739789