Schedule Batch Prediction jobs¶
After creating a job definition, you can choose to execute job definitions on a scheduled basis instead of manually doing so through the /batchPredictions/fromJobDefinition
endpoint.
A Scheduled Batch Prediction job works just like a regular Batch Prediction job, except DataRobot handles the execution of the job.
In order to schedule the execution of a Batch Prediction job, a definition must first be created, as described here.
For more information about Batch Prediction REST API routes, view the DataRobot REST API reference documentation.
Schedule a job definition¶
The API accepts the keywords enabled
as well as a schedule
object, as such:
POST https://app.datarobot.com/api/v2/batchPredictionJobDefinitions
{
"deploymentId": "<deployment_id>",
"intakeSettings": {
"type": "dataset",
"datasetId": "<dataset_ud>"
},
"outputSettings": {
"type": "jdbc",
"statementType": "insert",
"credentialId": "<credential_id>",
"dataStoreId": "<data_store_id>",
"schema": "public",
"table": "example_table",
"createTableIfNotExists": false
},
"includeProbabilities": true,
"includePredictionStatus": true,
"passthroughColumnsSet": "all"
"enabled": false,
"schedule": {
"minute": [0],
"hour": [1],
"month": ["*"]
"dayOfWeek": ["*"],
"dayOfMonth": ["*"],
}
}
Schedule
payload¶
The schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"]
denoting "every" time denomination or an array of integers (e.g. [1, 2, 3]
) to define a specific interval.
Key | Possible values | Example | Description |
---|---|---|---|
minute | ["*"] or [0 ... 59] |
[15, 30, 45] |
The job will run at these minute values for every hour of the day. |
hour | ["*"] or [0 ... 23] |
[12,23] |
The hour(s) of the day that the job will run. |
month | ["*"] or [1 ... 12] |
["jan"] |
Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., "jan" or "october"). Months that are not compatible with dayOfMonth are ignored,
for example {"dayOfMonth": [31], "month":["feb"]}.
|
dayOfWeek | ["*"] or [0 ... 6] where (Sunday=0) |
["sun"] |
The day(s) of the week that the job will run. Strings, either 3-letter abbreviations or the full name
of the day, can be used interchangeably (e.g., "sunday", "Sunday", "sun", or "Sun", all map to [0] ). NOTE: This field is additive with dayOfMonth , meaning the job will run both on the date specified by
dayOfMonth and the day defined in this field.
|
dayOfMonth | ["*"] or [1 ... 31] |
[1, 25] |
The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all
days of the month. NOTE: This field is additive with dayOfWeek , meaning the job will run both on the date(s)
defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday).
If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of
the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th).
Invalid dates such as February 31st are ignored.
|
Note
When specifying a time of day to run jobs, you must use UTC in the schedule
payload—local time zones are not supported.
To account for DST (daylight savings time), update the schedule according to your local time.
Examples¶
Interval | Example | Description |
---|---|---|
Run every 5 minutes | "schedule": { "minute": [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55], "hour": ["*"], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } |
Executes every time the minute dial of a clock reaches the number(s) defined in minute , since all other fields are with asterisks.
|
Run every full hour | "schedule": { "minute": [0], "hour": ["*"], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } |
Executes every time the clock reaches the minute(s) defined in minute . This example executes every day at 1:00 AM , 2:00 AM ,
3:00 AM , and so forth.
|
Run right before noon every day | "schedule": { "minute": [59], "hour": [11], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } |
Executes every time the minute dial of a clock reaches the minutes(s) defined in minute , and the
same when the hour dial reaches the number(s) defined in hour . This example executes every day at 11:59 AM .
|
Run every full hour once every half year | "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } |
Executes every time the minute dial of a clock reaches the minute(s) defined in minute ,
and only when the month is January (1 ) or June (6 ). |
Run every full hour once every half year and only on Mondays and Saturdays | "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["mon", "sun"], "dayOfMonth": ["*"], } |
Same as above, but with dayOfWeek specified, the interval is only executed on the days specified.
|
Run every full hour once every half year and only on Mondays and Saturdays, but also on the 1st and 10th of the month | "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["mon", "sun"], "dayOfMonth": [1, 10], } |
Same as above, but with both dayOfWeek and dayOfMonth specified, these
values add to each other, not excluding. This example executes on both the times defined in dayOfWeek and dayOfMonth ,
and not, as could be believed, only on those years where the 1st and 10th are Mondays and Sundays.
|
Disable a scheduled job¶
Job definitions are only be executed by the scheduler if enabled
is set to True
.
If you have a job definition that was previously running as a scheduled job, but should now be stopped, simply PATCH
the endpoint with enabled
set to False
.
If a job is currently running, this will finish execution regardless.
PATCH https://app.datarobot.com/api/v2/batchPredictionJobDefinitions/<job_definition_id>
{
"enabled": false
}
Limitations¶
The Scheduler has limitations set to how often a job can run and how many jobs can run at once.
Total runs per day¶
Each organization is limited to a number of job executions per day.
If you are a Self-Managed AI Platform user, you can change this limitation by changing the environment variable BATCH_PREDICTIONS_JOB_SCHEDULER_MAX_NUMBER_OF_RUNS_PER_DAY_PER_ORGANIZATION
.
On cloud, this limit is 1000
by default.
Note that the limitation is across all scheduled jobs per an organization, so if one scheduled job has a maximum run time of 1000
per day, no more scheduled jobs can be activated by that organization.
Schedules are best-effort¶
Depending on the load of different definitions running at the same time across the organization, the scheduler cannot guarantee to execute all jobs at the exact second of the schedule. However, in most cases, the scheduler will have resources to trigger the job within 5 seconds of the schedule.
Running the same definition simultaneously¶
One job definition cannot run more than once on a scheduled basis. This means that if a schedule job is taking long to execute, causing the next interval to trigger before the first one finished, the job will be rejected and aborted. This will continue to happen until the running job finishes.
Automatic disablement of failing jobs¶
If a user has created a job definition that cannot execute due to misconfiguration and is aborted, this will cause the enabled
feature to be auto-disabled after 5
consecutive failures.
It is therefore recommended that you use the existing /batchPredictions
endpoint to test if the solution works, before POST
ing the identical, confirmed working payload to the /batchPredictionJobDefinitions
.
For Self-Managed AI Platform customers, this cut-off point of consecutive failures can be adjusted by changing the BATCH_PREDICTIONS_JOB_SCHEDULER_FAILURES_BEFORE_ABORT
environment variable.