Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Schedule Batch Prediction jobs

After creating a job definition, you can choose to execute job definitions on a scheduled basis instead of manually doing so through the /batchPredictions/fromJobDefinition endpoint.

A Scheduled Batch Prediction job works just like a regular Batch Prediction job, except DataRobot handles the execution of the job.

In order to schedule the execution of a Batch Prediction job, a definition must first be created, as described here.

For more information about Batch Prediction REST API routes, view the DataRobot REST API reference documentation.

Schedule a job definition

The API accepts the keywords enabled as well as a schedule object, as such:

POST https://app.datarobot.com/api/v2/batchPredictionJobDefinitions

{
    "deploymentId": "<deployment_id>",
    "intakeSettings": {
        "type": "dataset",
        "datasetId": "<dataset_ud>"
    },
    "outputSettings": {
        "type": "jdbc",
        "statementType": "insert",
        "credentialId": "<credential_id>",
        "dataStoreId": "<data_store_id>",
        "schema": "public",
        "table": "example_table",
        "createTableIfNotExists": false
    },
    "includeProbabilities": true,
    "includePredictionStatus": true,
    "passthroughColumnsSet": "all"
    "enabled": false,
    "schedule": {
        "minute": [0],
        "hour": [1],
        "month": ["*"]
        "dayOfWeek": ["*"],
        "dayOfMonth": ["*"],
    }
}

Schedule payload

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"] denoting "every" time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

Key Possible values Example Description
minute ["*"] or [0 ... 59] [15, 30, 45] The job will run at these minute values for every hour of the day.
hour ["*"] or [0 ... 23] [12,23] The hour(s) of the day that the job will run.
month ["*"] or [1 ... 12] ["jan"] Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., "jan" or "october").
Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}.
dayOfWeek ["*"] or [0 ... 6] where (Sunday=0) ["sun"] The day(s) of the week that the job will run. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., "sunday", "Sunday", "sun", or "Sun", all map to [0]).

NOTE: This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.
dayOfMonth ["*"] or [1 ... 31] [1, 25] The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month.

NOTE: This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Note

When specifying a time of day to run jobs, you must use UTC in the schedule payload—local time zones are not supported. To account for DST (daylight savings time), update the schedule according to your local time.

Examples

Interval Example Description
Run every 5 minutes "schedule": { "minute": [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55], "hour": ["*"], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } Executes every time the minute dial of a clock reaches the number(s) defined in minute, since all other fields are with asterisks.
Run every full hour "schedule": { "minute": [0], "hour": ["*"], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } Executes every time the clock reaches the minute(s) defined in minute.

This example executes every day at 1:00 AM, 2:00 AM, 3:00 AM, and so forth.
Run right before noon every day "schedule": { "minute": [59], "hour": [11], "month": ["*"] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } Executes every time the minute dial of a clock reaches the minutes(s) defined in minute, and the same when the hour dial reaches the number(s) defined in hour.

This example executes every day at 11:59 AM.
Run every full hour once every half year "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["*"], "dayOfMonth": ["*"], } Executes every time the minute dial of a clock reaches the minute(s) defined in minute, and only when the month is January (1) or June (6).
Run every full hour once every half year and only on Mondays and Saturdays "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["mon", "sun"], "dayOfMonth": ["*"], } Same as above, but with dayOfWeek specified, the interval is only executed on the days specified.
Run every full hour once every half year and only on Mondays and Saturdays, but also on the 1st and 10th of the month "schedule": { "minute": [0], "hour": ["*"], "month": [1, 6] "dayOfWeek": ["mon", "sun"], "dayOfMonth": [1, 10], } Same as above, but with both dayOfWeek and dayOfMonth specified, these values add to each other, not excluding.

This example executes on both the times defined in dayOfWeek and dayOfMonth, and not, as could be believed, only on those years where the 1st and 10th are Mondays and Sundays.

Disable a scheduled job

Job definitions are only be executed by the scheduler if enabled is set to True. If you have a job definition that was previously running as a scheduled job, but should now be stopped, simply PATCH the endpoint with enabled set to False. If a job is currently running, this will finish execution regardless.

PATCH https://app.datarobot.com/api/v2/batchPredictionJobDefinitions/<job_definition_id>

    {
        "enabled": false
    }

Limitations

The Scheduler has limitations set to how often a job can run and how many jobs can run at once.

Total runs per day

Each organization is limited to a number of job executions per day. If you are a Self-Managed AI Platform user, you can change this limitation by changing the environment variable BATCH_PREDICTIONS_JOB_SCHEDULER_MAX_NUMBER_OF_RUNS_PER_DAY_PER_ORGANIZATION. On cloud, this limit is 1000 by default.

Note that the limitation is across all scheduled jobs per an organization, so if one scheduled job has a maximum run time of 1000 per day, no more scheduled jobs can be activated by that organization.

Schedules are best-effort

Depending on the load of different definitions running at the same time across the organization, the scheduler cannot guarantee to execute all jobs at the exact second of the schedule. However, in most cases, the scheduler will have resources to trigger the job within 5 seconds of the schedule.

Running the same definition simultaneously

One job definition cannot run more than once on a scheduled basis. This means that if a schedule job is taking long to execute, causing the next interval to trigger before the first one finished, the job will be rejected and aborted. This will continue to happen until the running job finishes.

Automatic disablement of failing jobs

If a user has created a job definition that cannot execute due to misconfiguration and is aborted, this will cause the enabled feature to be auto-disabled after 5 consecutive failures. It is therefore recommended that you use the existing /batchPredictions endpoint to test if the solution works, before POSTing the identical, confirmed working payload to the /batchPredictionJobDefinitions. For Self-Managed AI Platform customers, this cut-off point of consecutive failures can be adjusted by changing the BATCH_PREDICTIONS_JOB_SCHEDULER_FAILURES_BEFORE_ABORT environment variable.


Updated February 24, 2023