# Batch Prediction API

> Batch Prediction API - The Batch Prediction API provides flexible options for scoring large datasets
> using the prediction servers you have already deployed.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T05:15:47.507628+00:00` (UTC).

## Primary page

- [Batch Prediction API](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html): Full documentation for this topic (HTML).

## Sections on this page

- [Limits](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#limits): In-page section heading.
- [Concurrent jobs](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#concurrent-jobs): In-page section heading.
- [Data pipeline](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#data-pipeline): In-page section heading.
- [Data sources supported for batch predictions](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#data-sources-supported-for-batch-predictions): In-page section heading.
- [Concurrent scoring](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#concurrent-scoring): In-page section heading.
- [Job states](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#job-states): In-page section heading.
- [Store credentials securely](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#store-credentials-securely): In-page section heading.
- [CSV format](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#csv-format): In-page section heading.
- [Model monitoring](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#model-monitoring): In-page section heading.
- [Override the default prediction instance](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#override-the-default-prediction-instance): In-page section heading.
- [Consistent scoring with updated model](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#consistent-scoring-with-updated-model): In-page section heading.
- [Template variables](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#template-variables): In-page section heading.
- [API Reference](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#api-reference): In-page section heading.
- [The Public API](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#the-public-api): In-page section heading.
- [The Python API Client](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#the-python-api-client): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [API reference](https://docs.datarobot.com/en/docs/api/reference/index.html): Linked from this page.
- [DataRobot REST API reference documentation](https://docs.datarobot.com/en/docs/api/reference/public-api/index.html): Linked from this page.
- [AI Catalog](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html): Linked from this page.
- [external data sources](https://docs.datarobot.com/en/docs/classic-ui/data/connect-data/data-conn.html#add-data-sources): Linked from this page.
- [files greater than 1GB via the API](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/large-preds-api.html): Linked from this page.
- [reference the time series documentation](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/batch-pred-ts.html): Linked from this page.
- [feature considerations](https://docs.datarobot.com/en/docs/reference/data-ref/data-sources/wb-maxcompute.html#feature-considerations): Linked from this page.
- [Databricks JDBC driver](https://docs.datarobot.com/en/docs/reference/data-ref/data-sources/dc-databricks.html): Linked from this page.
- [intake options](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/intake-options.html): Linked from this page.
- [output options](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/output-options.html): Linked from this page.
- [Output format](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/output-format.html): Linked from this page.
- [data credentials](https://docs.datarobot.com/en/docs/platform/acct-settings/stored-creds.html): Linked from this page.
- [complete example](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/pred-examples.html#end-to-end-scoring-of-csv-files-from-local-files): Linked from this page.
- [Deployments> Predictions > Prediction API](https://docs.datarobot.com/en/docs/classic-ui/predictions/realtime/code-py.html): Linked from this page.
- [Job Definitions](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/job-definitions.html): Linked from this page.
- [DataRobot REST API](https://docs.datarobot.com/en/docs/api/reference/public-api/batch_predictions.html): Linked from this page.

## Documentation content

# Batch Prediction API

The Batch Prediction API provides flexible options for intake and output when scoring large datasets using the prediction servers you have already deployed. The API is exposed through the DataRobot Public API. The API can be consumed using either any REST-enabled client or the [DataRobot Python Public API bindings](https://datarobot-public-api-client.readthedocs-hosted.com/page/).

For more information about Batch Prediction REST API routes, view the [DataRobot REST API reference documentation](https://docs.datarobot.com/en/docs/api/reference/public-api/index.html).

The main features of the API are:

- Flexible options for intake and output:
- Protection against prediction server overload with a concurrency control level option.
- Inclusion of Prediction Explanations (with an option to add thresholds).
- Support for passthrough columns to correlate scored data with source data.
- Addition of prediction warnings in the output.
- The ability to make predictions with files greater than 1GB via the API .

For more information about making batch prediction settings for time series, [reference the time series documentation](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/batch-pred-ts.html).

## Limits

| Item | AI Platform (SaaS) | Self-managed AI Platform (VPC or on-prem) |
| --- | --- | --- |
| Job runtime limit | 4 hours* | Unlimited |
| Local file intake size | Unlimited | Unlimited |
| Local file write size | Unlimited | Unlimited |
| S3 intake size | Unlimited | Unlimited |
| S3 write size | 100GB | 100GB (configurable) |
| Azure intake size | 4.75TB | 4.75TB |
| Azure write size | 195GB | 195GB |
| GCP intake size | 5TB | 5TB |
| GCP write size | 5TB | 5TB |
| JDBC intake size | Unlimited | Unlimited |
| JDBC output size | Unlimited | Unlimited |
| Concurrent jobs | 1 per prediction instance | 1 per installation |
| Stored data retention time For local file adapters | 48 hours | 48 hours (configurable) |

* Feature Discovery projects have a job runtime limit of 6 hours.

## Concurrent jobs

To ensure that the prediction server does not get overloaded, DataRobot will only run one job per prediction instance.
Further jobs are queued and started as soon as previous jobs complete.

## Data pipeline

A Batch Prediction job is a data pipeline consisting of:

> Data Intake > Concurrent Scoring > Data Output

On creation, the job's `intakeSettings` and `outputSettings` define the data intake and data output part of the pipeline.
You can configure any combination of intake and output options.
For both, the defaults are local file intake and output, meaning you will have to issue a separate `PUT` request with the data to score and subsequently download the scored data.

### Data sources supported for batch predictions

The following table shows the data source support for batch predictions.

| Name | Driver version | Intake support | Output support | DataRobot version validated |
| --- | --- | --- | --- | --- |
| AWS Athena 2.0 | 2.0.35 | yes | no | 7.3 |
| AWS S3 | 2022.1.1670354484 | yes | yes | - |
| Alibaba Cloud MaxCompute¹ | 3.6.0 | yes | yes | 11.1 |
| Databricks² | 2.6.40 | yes | yes | 9.2 |
| Exasol | 7.0.14 | yes | yes | 8.0 |
| Google BigQuery | 1.2.4 | yes | yes | 7.3 |
| InterSystems | 3.2.0 | yes | no | 7.3 |
| kdb+ | - | yes | yes | 7.3 |
| Microsoft SQL Server | 12.2.0 | yes | yes | 6.0 |
| MySQL | 8.0.32 | yes | yes | 6.0 |
| Oracle | 11.2.0 | yes | yes | 7.3 |
| PostgreSQL | 42.5.1 | yes | yes | 6.0 |
| Presto³ | 0.216 | yes | yes | 8.0 |
| Redshift | 2.1.0.14 | yes | yes | 6.0 |
| SAP HANA | 2.20.17 | yes | yes | 7.3 (intake support only) 10.1 (intake and output support) |
| Snowflake | 3.15.1 | yes | yes | 6.2 |
| Synapse | 12.4.1 | yes | yes | 7.3 |
| Teradata⁴ | 17.10.00.23 | yes | yes | 7.3 |
| TreasureData | 0.5.10 | yes | no | 7.3 |

¹ Only the "insert" write strategy is supported. Data table and column names cannot contain special characters. These names can contain letters, digits, and underscores (_); however, they must start with a letter and cannot exceed 128 bytes in length. For more information, see the [feature considerations](https://docs.datarobot.com/en/docs/reference/data-ref/data-sources/wb-maxcompute.html#feature-considerations).

² Only the [Databricks JDBC driver](https://docs.datarobot.com/en/docs/reference/data-ref/data-sources/dc-databricks.html) supports batch predictions.

³ Presto requires the use of `auto commit: true` for many of the underlying connectors which can delay writes.

⁴ For output to Teradata, DataRobot only supports ANSI mode.

For further information, see:

- Supported intake options
- Supported output options
- Output format schema

## Concurrent scoring

When scoring, the data you supply is split into chunks and scored concurrently on the prediction instance specified by the deployment.
To control the level of concurrency, modify the `numConcurrent` parameter at job creation.

## Job states

When working with batch predictions, each prediction job can be in one of four states:

- INITIALIZING : The job has been successfully created and is either:
- RUNNING : Scoring the dataset on prediction servers has started.
- ABORTED : The job was aborted because either:
- COMPLETED : The dataset has been scored and:

## Store credentials securely

Some sources or targets for scoring may require DataRobot to authenticate on your behalf (for example, if your database requires that you pass a username and password for login). To ensure proper storage of these credentials, you must have [data credentials](https://docs.datarobot.com/en/docs/platform/acct-settings/stored-creds.html) enabled.

DataRobot uses the following credential types and properties:

| Adapter | Credential Type | Property |
| --- | --- | --- |
| S3 intake / output | s3 | awsAccessKeyId awsSecretAccessKey awsSessionToken (optional) |
| JDBC intake / output | basic | username password |

To use a stored credential, you must pass the associated `credentialId` in either `intakeSettings` or `outputSettings` as described below for each of the adapters.

## CSV format

For any intake or output options that deal with reading or writing CSV files, you can use a custom format by specifying the following in `csvSettings`:

| Parameter | Example | Description |
| --- | --- | --- |
| delimiter | , | (Optional) The delimiter character to use. Default: , (comma). To specify TAB as a delimiter, use the string tab. |
| quotechar | " | (Optional) The character to use for quoting fields containing the delimiter. Default: ". |
| encoding | utf-8 | (Optional) Encoding for the CSV file. For example (but not limited to): shift_jis, latin_1 or mskanji. Default: utf-8. Any Python supported encoding can be used. |

The same format will be used for both intake and output. See a [complete example](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/pred-examples.html#end-to-end-scoring-of-csv-files-from-local-files).

## Model monitoring

The Batch Prediction API integrates well with DataRobot's model monitoring capabilities:

- If you have enabled data drift tracking for your deployment, any predictions run through the Batch Prediction API will be tracked.
- If you have enabled target drift tracking for your deployment, the output will contain the desired association ID to be used for reporting actuals.

Should you need to run a non-production dataset against your deployment, you can turn off drift and accuracy tracking for a single job by providing the following parameter:

| Parameter | Example | Description |
| --- | --- | --- |
| skipDriftTracking | true | (Optional) Skip data drift, target drift, and accuracy tracking for this job. Default: false. |

## Override the default prediction instance

Under normal circumstances, the prediction server used for scoring will be the default prediction server that your model was deployed to. It is however possible to override it If you have access to multiple prediction servers, you can override the default behavior by using the following properties in the `predictionInstance` option:

| Parameter | Example | Description |
| --- | --- | --- |
| hostName | 192.0.2.4 | Sets the hostname to use instead of the default hostname from the prediction server the model was deployed to. |
| sslEnabled | false | (Optional) Use SSL (HTTPS) to access the prediction server. Default: true. |
| apiKey | NWU...IBn2w | (Optional) Use an API key different from the job creator's key to authenticate against the new prediction server. |
| datarobotKey | 154a8abb-cbde-4e73-ab3b-a46c389c337b | (Optional) If running in a managed AI Platform environment, specify the per-organization DataRobot key for the prediction server. Find the key on the Deployments> Predictions > Prediction API tab or by contacting your DataRobot representative. |

Here's a complete example:

```
job_details = {
    'deploymentId': deployment_id,
    'intakeSettings': {'type': 'localFile'},
    'outputSettings': {'type': 'localFile'},
    'predictionInstance': {
        'hostName': '192.0.2.4',
        'sslEnabled': False,
        'apiKey': 'NWUQ9w21UhGgerBtOC4ahN0aqjbjZ0NMhL1e5cSt4ZHIBn2w',
        'datarobotKey': '154a8abb-cbde-4e73-ab3b-a46c389c337b',
    },
}
```

## Consistent scoring with updated model

If you deploy a new model after a job has been queued, DataRobot will still use the model that was deployed at the time of job creation for the entire job. Every row will be scored with the same model.

## Template variables

Sometimes it can be useful to specify dynamic parameters in your batch jobs, such as in [Job Definitions](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/job-definitions.html). You can use [jinja's variable syntax](https://jinja.palletsprojects.com/en/3.0.x/templates/#variables) (double curly braces) to print the value of the following parameters:

| Variable | Description |
| --- | --- |
| current_run_time | datetime object for current UTC time (datetime.utcnow()) |
| current_run_timestamp | Milliseconds from Unix epoch (integer) |
| last_scheduled_run_time | datetime object for the start of last job instantiated from the same job definition |
| next_scheduled_run_time | datetime object for the next scheduled start of job from the same job definition |
| last_completed_run_time | datetime object for when the previously scheduled job finished scoring |

The above variables can be used in the following fields:

| Field | Condition |
| --- | --- |
| intake_settings.query | For JDBC, Synapse, and Snowflake adapters |
| output_settings.table | For JDBC, Synapse, Snowflake, and BigQuery adapters, when statement type is create_table or create_table_if_not_exists is marked true |
| output_settings.url | For S3, GCP, and Azure adapters |

You should specify the URL as: `gs://bucket/output-<added-string-with-double-curly-braces>.csv`.

> [!NOTE] Note
> To ensure that most databases understand the replacements mentioned above, DataRobot strips microseconds off the ISO-8601 format timestamps.

## API Reference

### The Public API

The Batch Prediction API is part of the [DataRobot REST API](https://docs.datarobot.com/en/docs/api/reference/public-api/batch_predictions.html). Reference this documentation for more information about how to work with batch predictions.

### The Python API Client

You can use the [Python Public API Client](https://datarobot-public-api-client.readthedocs-hosted.com/) to interface with the Batch Prediction API.