Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Predictions on large datasets

File size limits vary depending on the prediction method—for predictions on large datasets, use the Batch Prediction API or real-time Prediction API.

The following example shows how to make predictions on a large dataset using the Batch Prediction API. See the Prediction API for real-time predictions.

In this example, the prediction dataset is stored in the AI Catalog. The Batch Prediction API also supports predicting on data sourced from other locations. Note that for predicting with a dataset from the AI Catalog, the dataset must be snapshotted.

In addition to the API key sent in the header of all API requests, you need the following to use the Batch Prediction API:

  1. <deployment_id>: The deployment ID for the model being used to make predictions against.
  2. <dataset_id>: The dataset ID of the snapshotted AI Catalog dataset used by the model <deployment_id>.

The following steps show how to work with files greater than 100 MB using the batchPredictions API endpoint. In summary, you will:

  1. Create a BatchPrediction job indicating the deployed model and dataset to use.
  2. Check the status of that BatchPrediction job until it is complete.
  3. Download the results.

1. Create a Batch Prediction job

POST https://app.datarobot.com/api/v2/batchPredictions

Sample request:

{
    "deploymentId": "<deployment_id>",
    "intakeSettings": {
        "type": "dataset",
        "datasetId": "<dataset_id>"
    }
}

Sample time series request (requires enabling the time series product and the Batch Predictions for time series preview flag):

{
    "deploymentId": "<deployment_id>",
    "intakeSettings": {
        "type": "dataset",
        "datasetId": "<dataset_id>"
    },
    "timeseriesSettings": {
        "type": "forecast"
    }
}

Sample response:

The links.self property of the response contains the URL used for the next two steps.

{
 "status": "INITIALIZING",
    "skippedRows": 0,
    "failedRows": 0,
    "elapsedTimeSec": 0,
    "logs": [
        "Job created by user@example.com from 10.1.2.1 at 2020-02-19 22:41:00.865000"
    ],
    "links": {
        "download": null,
        "self": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/"
    },
    "jobIntakeSize": null,
    "scoredRows": 0,
    "jobOutputSize": null,
    "jobSpec": {
        "includeProbabilitiesClasses": [],
        "maxExplanations": 0,
        "predictionWarningEnabled": null,
        "numConcurrent": 4,
        "thresholdHigh": null,
        "passthroughColumnsSet": null,
        "csvSettings": {
            "quotechar": "\"",
            "delimiter": ",",
            "encoding": "utf-8"
        },
        "thresholdLow": null,
        "outputSettings": {
            "type": "localFile"
        },
        "includeProbabilities": true,
        "columnNamesRemapping": {},
        "deploymentId": "<deployment_id>",
        "abortOnError": true,
        "intakeSettings": {
            "type": "dataset",
            "datasetId": "<dataset_id>"
        },
        "includePredictionStatus": false,
        "skipDriftTracking": false,
        "passthroughColumns": null
    },
    "statusDetails": "Job created by user@example.com from 10.1.2.1 at 2020-02-19   22:41:00.865000",
    "percentageCompleted": 0.0
}

The links.self property https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/ is the variable <batch_prediction_job_status_url> in the Step 2 GET call, below.

2. Check the status of the batch prediction job

GET <batch_prediction_job_status_url>

Sample response:

{
    "status": "INITIALIZING",
    "skippedRows": 0,
    "failedRows": 0,
    "elapsedTimeSec": 352,
    "logs": [
        "Job created by user@example.com from 10.1.2.1 at 2020-02-19 22:41:00.865000",
        "Job started processing at 2020-02-19 22:41:16.192000"
    ],
    "links": {
        "download": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/download/",
        "self": "https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/"
    },
    "jobIntakeSize": null,
    "scoredRows": 1982300,
    "jobOutputSize": null,
    "jobSpec": {
        "includeProbabilitiesClasses": [],
        "maxExplanations": 0,
        "predictionWarningEnabled": null,
        "numConcurrent": 4,
        "thresholdHigh": null,
        "passthroughColumnsSet": null,
        "csvSettings": {
            "quotechar": "\"",
            "delimiter": ",",
            "encoding": "utf-8"
        },
        "thresholdLow": null,
        "outputSettings": {
            "type": "localFile"
        },
        "includeProbabilities": true,
        "columnNamesRemapping": {},
        "deploymentId": "<deployment_id>",
        "abortOnError": true,
        "intakeSettings": {
            "type": "dataset",
            "datasetId": "<dataset_id>"
        },
        "includePredictionStatus": false,
        "skipDriftTracking": false,
        "passthroughColumns": null
    },
    "statusDetails": "Job started processing at 2020-02-19 22:41:16.192000",
    "percentageCompleted": 0.0
}

The links.download property https://app.datarobot.com/api/v2/batchPredictions/a1b2c3d4x5y6z7/download/ is the variable<batch_prediction_job_download_url> in the Step 3 GET call, below.

3. Download the results of the batch prediction job

Continue polling the status URL above until the job status is COMPLETED and error-free. At that point, predictions can be downloaded.

GET <batch_prediction_job_download_url>


Updated March 26, 2024