Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Multipart upload for the batch prediction API

Availability information

Multipart upload for batch predictions is off by default. Contact your DataRobot representative or administrator for information on enabling this feature.

Feature flag: Enable multipart upload for batch predictions

The batch prediction API's local file intake process requires that you upload scoring data for a job using a PUT request to the URL specified in the csvUpload parameter. By default, a single PUT request starts the job (or queues it for processing if the prediction instance is occupied). Multipart upload for batch predictions allows you to override the default behavior to upload scoring data through multiple files. This upload process requires multiple PUT requests followed by a single POST request (finalizeMultipart) to finalize the multipart upload manually. This feature can be helpful when you want to upload large datasets over a slow connection.

Note

For more information on the batch prediction API and local file intake, see Batch Prediction API and Prediction intake options.

Multipart upload endpoints

This feature adds the following multipart upload endpoints to the batch prediction API:

Endpoint Description
PUT /api/v2/batchPredictions/:id/csvUpload/part/0/ Upload scoring data in multiple parts to the URL specified by csvUpload. Increment 0 by 1 in sequential order for each part of the upload.
POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/ Finalize the multipart upload process. Make sure each part of the upload has finished before finalizing.

Local file intake settings

The intake settings for the local file adapter added two new properties to support multipart upload for the batch prediction API:

Property Type Default Description
intakeSettings.multipart boolean false
  • true: Requires you to submit multiple files via a PUT request and finalize the process manually via a POST request (finalizeMultipart).
  • false: Finalizes intake after one file is submitted via a PUT request.
intakeSettings.async boolean true
  • true: Starts the scoring job when the initial PUT request for file intake is made.
  • false: Postpones the scoring job until the PUT request resolves or the POST request for finalizeMultipart resolves.

Multipart intake setting

To enable the new multipart upload workflow, configure the intakeSettings for the localFile adapter as shown in the following sample request:

{
    "intakeSettings": {
        "type": "localFile",
        "multipart": true
    }
}
These properties alter the local file upload workflow, requiring you to:

  • Upload any number of sequentially numbered files.

  • Finalize the upload to indicate that all required files uploaded successfully.

Async intake setting

To enable the new multipart upload workflow with async enabled, configure the intakeSettings for the localFile adapter as shown in the following sample request:

Note

You can also use the async intake setting independently of the multipart setting.

{
    "intakeSettings": {
        "type": "localFile",
        "multipart": true,
        "async": false
    }
}

A defining feature of batch predictions is that the scoring job starts on the initial file upload, and only one batch prediction job at a time can run for any given prediction instance. This functionality may cause issues when uploading large datasets over a slow connection. In these cases, the client's upload speed could create a bottleneck and block the processing of other jobs. To avoid this potential bottleneck, you can set async to false, as shown in the example above. This configuration postpones submitting the batch prediction job to the queue.

When "async": false, the point at which a job enters the batch prediction queue depends on the multipart setting:

  • If "multipart": true, the job is submitted to the queue after the POST request for finalizeMultipart resolves.

  • If "multipart": false, the job is submitted to the queue after the initial file intake PUT request resolves.

Example multipart upload requests

The batch prediction API requests required to upload a 3 part multipart batch prediction job would be:

PUT /api/v2/batchPredictions/:id/csvUpload/part/0/

PUT /api/v2/batchPredictions/:id/csvUpload/part/1/

PUT /api/v2/batchPredictions/:id/csvUpload/part/2/

POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/

Each uploaded part is a complete CSV file with a header.

Abort a multipart upload

If you start a multipart upload that you don't want to finalize, you can use a DELETE request to the existing batchPredictions abort route:

DELETE /api/v2/batchPredictions/:id/

Updated March 21, 2023