# Predictions on large datasets

> Predictions on large datasets - Walk through an example of making predictions on a large dataset
> using the Batch Prediction API.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:09.613482+00:00` (UTC).

## Primary page

- [Predictions on large datasets](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/large-preds-api.html): Full documentation for this topic (HTML).

## Sections on this page

- [1. Create a Batch Prediction job](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/large-preds-api.html#1-create-a-batch-prediction-job): In-page section heading.
- [2. Check the status of the batch prediction job](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/large-preds-api.html#2-check-the-status-of-the-batch-prediction-job): In-page section heading.
- [3. Download the results of the batch prediction job](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/large-preds-api.html#3-download-the-results-of-the-batch-prediction-job): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [API reference](https://docs.datarobot.com/en/docs/api/reference/index.html): Linked from this page.
- [Batch Prediction API](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html): Linked from this page.
- [File size limits](https://docs.datarobot.com/en/docs/classic-ui/predictions/pred-file-limits.html): Linked from this page.
- [Prediction API](https://docs.datarobot.com/en/docs/api/reference/predapi/legacy-predapi/dr-predapi.html): Linked from this page.
- [other locations](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/intake-options.html): Linked from this page.

## Documentation content

[File size limits](https://docs.datarobot.com/en/docs/classic-ui/predictions/pred-file-limits.html) vary depending on the prediction method—for predictions on large datasets, use the Batch Prediction API or real-time Prediction API.

The following example shows how to make predictions on a large dataset using the Batch Prediction API. See the [Prediction API](https://docs.datarobot.com/en/docs/api/reference/predapi/legacy-predapi/dr-predapi.html) for real-time predictions.

In this example, the prediction dataset is stored in the AI Catalog. The Batch Prediction API also supports predicting on data sourced from [other locations](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/intake-options.html).  Note that for predicting with a dataset from the AI Catalog, the dataset must be snapshotted.

In addition to the API key sent in the header of all API requests, you need the following to use the Batch Prediction API:

1. <deployment_id> : The deployment ID for the model being used to make predictions against.
2. <dataset_id> : The dataset ID of the snapshotted AI Catalog dataset used by the model <deployment_id> .

The  following steps show how to work with files greater than 100MB using the `batchPredictions` API endpoint. In summary, you will:

1. Create a BatchPrediction job indicating the deployed model and dataset to use.
2. Check the status of that BatchPrediction job until it is complete.
3. Download the results.

### 1. Create a Batch Prediction job

`POST https://app.datarobot.com/api/v2/batchPredictions`

Sample request:

```
{
    "deploymentId": "<deployment_id>",
    "intakeSettings": {
        "type": "dataset",
        "datasetId": "<dataset_id>"
    }
}
```

Sample time series request (requires enabling the time series product and the Batch Predictions for time series preview flag):

```
{
    "deploymentId": "<deployment_id>",
    "intakeSettings": {
        "type": "dataset",
        "datasetId": "<dataset_id>"
    },
    "timeseriesSettings": {
        "type": "forecast"
    }
}
```

Sample response:

The `links.self` property of the response contains the URL used for the next two steps.

```
{
 "status": "INITIALIZING",
    "skippedRows": 0,
    "failedRows": 0,
    "elapsedTimeSec": 0,
    "logs": [
        "Job created by user@example.com from 10.1.2.1 at 2020-02-19 22:41:00.865000"
    ],
    "links": {
        "download": null,
        "self": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/"
    },
    "jobIntakeSize": null,
    "scoredRows": 0,
    "jobOutputSize": null,
    "jobSpec": {
        "includeProbabilitiesClasses": [],
        "maxExplanations": 0,
        "predictionWarningEnabled": null,
        "numConcurrent": 4,
        "thresholdHigh": null,
        "passthroughColumnsSet": null,
        "csvSettings": {
            "quotechar": "\"",
            "delimiter": ",",
            "encoding": "utf-8"
        },
        "thresholdLow": null,
        "outputSettings": {
            "type": "localFile"
        },
        "includeProbabilities": true,
        "columnNamesRemapping": {},
        "deploymentId": "<deployment_id>",
        "abortOnError": true,
        "intakeSettings": {
            "type": "dataset",
            "datasetId": "<dataset_id>"
        },
        "includePredictionStatus": false,
        "skipDriftTracking": false,
        "passthroughColumns": null
    },
    "statusDetails": "Job created by user@example.com from 10.1.2.1 at 2020-02-19   22:41:00.865000",
    "percentageCompleted": 0.0
}
```

The `links.self` property `https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/` is the variable `<batch_prediction_job_status_url>` in the Step 2 GET call, below.

### 2. Check the status of the batch prediction job

`GET <batch_prediction_job_status_url>`

Sample response:

```
{
    "status": "INITIALIZING",
    "skippedRows": 0,
    "failedRows": 0,
    "elapsedTimeSec": 352,
    "logs": [
        "Job created by user@example.com from 10.1.2.1 at 2020-02-19 22:41:00.865000",
        "Job started processing at 2020-02-19 22:41:16.192000"
    ],
    "links": {
        "download": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/download/",
        "self": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/"
    },
    "jobIntakeSize": null,
    "scoredRows": 1982300,
    "jobOutputSize": null,
    "jobSpec": {
        "includeProbabilitiesClasses": [],
        "maxExplanations": 0,
        "predictionWarningEnabled": null,
        "numConcurrent": 4,
        "thresholdHigh": null,
        "passthroughColumnsSet": null,
        "csvSettings": {
            "quotechar": "\"",
            "delimiter": ",",
            "encoding": "utf-8"
        },
        "thresholdLow": null,
        "outputSettings": {
            "type": "localFile"
        },
        "includeProbabilities": true,
        "columnNamesRemapping": {},
        "deploymentId": "<deployment_id>",
        "abortOnError": true,
        "intakeSettings": {
            "type": "dataset",
            "datasetId": "<dataset_id>"
        },
        "includePredictionStatus": false,
        "skipDriftTracking": false,
        "passthroughColumns": null
    },
    "statusDetails": "Job started processing at 2020-02-19 22:41:16.192000",
    "percentageCompleted": 0.0
}
```

The `links.download` property `https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/download/` is the variable `<batch_prediction_job_download_url>` in the Step 3 GET call, below.

### 3. Download the results of the batch prediction job

Continue polling the status URL above until the job status is COMPLETED and error-free. At that point, predictions can be downloaded.

`GET <batch_prediction_job_download_url>`
