Time series¶
Batch predictions for time series models work without any additional configuration. However, in most cases you need to either modify the default configuration or prepare the prediction dataset.
Time series batch prediction settings¶
The default configuration can be overridden using the timeseriesSettings
job configuration property:
Parameter | Example | Description |
---|---|---|
type |
forecast |
Must be either forecast (default) or historical . |
forecastPoint |
2019-02-04T00:00:00Z |
(Optional) By default, DataRobot infers the forecast point from the dataset. To configure, type must be set to forecast . |
predictionsStartDate |
2019-01-04T00:00:00Z |
(Optional) By default, DataRobot infers the start date from the dataset. To configure, type must be set to historical . |
predictionsEndDate |
2019-02-04T00:00:00Z |
(Optional) By default, DataRobot infers the end date from the dataset. To configure, type must be set to historical . |
relaxKnownInAdvanceFeaturesCheck |
false |
(Optional) If activated, missing values in the known in advance features are allowed in the forecast window at prediction time. If omitted or false , missing values are not allowed. Default: false . |
Here is a complete example job:
{
"deploymentId": "5f22ba7ade0f435ba7217bcf",
"intakeSettings": {"type": "localFile"},
"outputSettings": {"type": "localFile"},
"timeseriesSettings": {
"type": "historical",
"predictionsStartDate": "2020-01-01",
"predictionsEndDate": "2020-03-31"
}
}
An example using Python SDK:
import datarobot as dr
dr.Client(
endpoint="https://app.datarobot.com/api/v2",
token="...",
)
deployment_id = "..."
input_file = "to_predict.csv"
output_file = "predicted.csv"
job = dr.BatchPredictionJob.score_to_file(
deployment_id,
input_file,
output_file,
timeseries_settings={
"type": "historical",
"predictions_start_date": "2020-01-01",
"predictions_end_date": "2020-03-31",
},
)
print("started scoring...", job)
job.wait_for_completion()
Prediction type¶
When using forecast
mode, DataRobot makes predictions using forecastPoint
or rows in the dataset without a target. In historical
mode, DataRobot enables bulk predictions, which calculates predictions for all possible forecast points and forecast distances within predictionsStartDate
and predictionsEndDate
range.
Requirements for the scoring dataset¶
To ensure the Batch Prediction API can process your time series dataset, you must configure the following:
- Sort prediction rows by their timestamps, with the earliest row first.
- If using multiseries, the prediction rows must be sorted by series ID then timestamp.
- There is no limit on the number of series DataRobot supports. The only limit is the job timeout as mentioned in Limits.
Single series forecast dataset example¶
The following is an example forecast dataset for a single series:
date | y |
---|---|
2020-01-01 | 9342.85 |
2020-01-02 | 4951.33 |
24 more historical rows | |
2020-01-27 | 4180.92 |
2020-01-28 | 5943.11 |
2020-01-29 | |
2020-01-30 | |
2020-01-31 | |
2020-02-01 | |
2020-02-02 | |
2020-02-03 | |
2020-02-04 |
Multiseries forecast dataset example¶
If scoring multiple series, the data must be ordered by series and timestamp:
date | series | y |
---|---|---|
2020-01-01 | A | 9342.85 |
2020-01-02 | A | 4951.33 |
24 more historical rows | ||
2020-01-27 | A | 4180.92 |
2020-01-28 | A | 5943.11 |
2020-01-29 | A | |
2020-01-30 | A | |
2020-01-31 | A | |
2020-02-01 | A | |
2020-02-02 | A | |
2020-02-03 | A | |
2020-02-04 | A | |
2020-01-01 | B | 8477.22 |
2020-01-02 | B | 7210.29 |
24 more historical rows | ||
2020-01-27 | B | 7400.21 |
2020-01-28 | B | 8844.71 |
2020-01-29 | B | |
2020-01-30 | B | |
2020-01-31 | B | |
2020-02-01 | B | |
2020-02-02 | B | |
2020-02-03 | B | |
2020-02-04 | B |