MLOps > MLOps preview features > Multipart upload for the batch prediction API

Multipart upload for the batch prediction API¶

Availability information

Multipart upload for batch predictions is off by default. Contact your DataRobot representative or administrator for information on enabling this feature.

Feature flag: Enable multipart upload for batch predictions

The batch prediction API's local file intake process requires that you upload scoring data for a job using a PUT request to the URL specified in the csvUpload parameter. By default, a single PUT request starts the job (or queues it for processing if the prediction instance is occupied). Multipart upload for batch predictions allows you to override the default behavior to upload scoring data through multiple files. This upload process requires multiple PUT requests followed by a single POST request (finalizeMultipart) to finalize the multipart upload manually. This feature can be helpful when you want to upload large datasets over a slow connection.

Note

For more information on the batch prediction API and local file intake, see Batch Prediction API and Prediction intake options.

Multipart upload endpoints¶

This feature adds the following multipart upload endpoints to the batch prediction API:

Endpoint	Description
`PUT /api/v2/batchPredictions/:id/csvUpload/part/0/`	Upload scoring data in multiple parts to the URL specified by `csvUpload`. Increment `0` by 1 in sequential order for each part of the upload.
`POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/`	Finalize the multipart upload process. Make sure each part of the upload has finished before finalizing.

Local file intake settings¶

The intake settings for the local file adapter added two new properties to support multipart upload for the batch prediction API:

Property	Type	Default	Description
`intakeSettings.multipart`	boolean	`false`	`true`: Requires you to submit multiple files via a `PUT` request and finalize the process manually via a `POST` request (`finalizeMultipart`). `false`: Finalizes intake after one file is submitted via a `PUT` request.
`intakeSettings.async`	boolean	`true`	`true`: Starts the scoring job when the initial `PUT` request for file intake is made. `false`: Postpones the scoring job until the `PUT` request resolves or the `POST` request for `finalizeMultipart` resolves.

Multipart intake setting¶

To enable the new multipart upload workflow, configure the intakeSettings for the localFile adapter as shown in the following sample request:

{
    "intakeSettings": {
        "type": "localFile",
        "multipart": true
    }
}

These properties alter the local file upload workflow, requiring you to:

Upload any number of sequentially numbered files.
Finalize the upload to indicate that all required files uploaded successfully.

Async intake setting¶

To enable the new multipart upload workflow with async enabled, configure the intakeSettings for the localFile adapter as shown in the following sample request:

Note

You can also use the async intake setting independently of the multipart setting.

{
    "intakeSettings": {
        "type": "localFile",
        "multipart": true,
        "async": false
    }
}

A defining feature of batch predictions is that the scoring job starts on the initial file upload, and only one batch prediction job at a time can run for any given prediction instance. This functionality may cause issues when uploading large datasets over a slow connection. In these cases, the client's upload speed could create a bottleneck and block the processing of other jobs. To avoid this potential bottleneck, you can set async to false, as shown in the example above. This configuration postpones submitting the batch prediction job to the queue.

When "async": false, the point at which a job enters the batch prediction queue depends on the multipart setting:

If "multipart": true, the job is submitted to the queue after the POST request for finalizeMultipart resolves.
If "multipart": false, the job is submitted to the queue after the initial file intake PUT request resolves.

Example multipart upload requests¶

The batch prediction API requests required to upload a 3 part multipart batch prediction job would be:

PUT /api/v2/batchPredictions/:id/csvUpload/part/0/

PUT /api/v2/batchPredictions/:id/csvUpload/part/1/

PUT /api/v2/batchPredictions/:id/csvUpload/part/2/

POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/

Each uploaded part is a complete CSV file with a header.

Abort a multipart upload¶

If you start a multipart upload that you don't want to finalize, you can use a DELETE request to the existing batchPredictions abort route:

DELETE /api/v2/batchPredictions/:id/

Multipart upload for the batch prediction API¶

Multipart upload endpoints¶

Local file intake settings¶

Multipart intake setting¶

Async intake setting¶

Example multipart upload requests¶

Abort a multipart upload¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?