Multipart upload for the batch prediction API¶
Availability information
Multipart upload for batch predictions is off by default. Contact your DataRobot representative or administrator for information on enabling this feature.
Feature flag: Enable multipart upload for batch predictions
The batch prediction API's local file intake process requires that you upload scoring data for a job using a PUT
request to the URL specified in the csvUpload
parameter. By default, a single PUT
request starts the job (or queues it for processing if the prediction instance is occupied). Multipart upload for batch predictions allows you to override the default behavior to upload scoring data through multiple files. This upload process requires multiple PUT
requests followed by a single POST
request (finalizeMultipart
) to finalize the multipart upload manually. This feature can be helpful when you want to upload large datasets over a slow connection.
Note
For more information on the batch prediction API and local file intake, see Batch Prediction API and Prediction intake options.
Multipart upload endpoints¶
This feature adds the following multipart upload endpoints to the batch prediction API:
Endpoint | Description |
---|---|
PUT /api/v2/batchPredictions/:id/csvUpload/part/0/ |
Upload scoring data in multiple parts to the URL specified by csvUpload . Increment 0 by 1 in sequential order for each part of the upload. |
POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/ |
Finalize the multipart upload process. Make sure each part of the upload has finished before finalizing. |
Local file intake settings¶
The intake settings for the local file adapter added two new properties to support multipart upload for the batch prediction API:
Property | Type | Default | Description |
---|---|---|---|
intakeSettings.multipart |
boolean | false |
|
intakeSettings.async |
boolean | true |
|
Multipart intake setting¶
To enable the new multipart upload workflow, configure the intakeSettings
for the localFile
adapter as shown in the following sample request:
{
"intakeSettings": {
"type": "localFile",
"multipart": true
}
}
-
Upload any number of sequentially numbered files.
-
Finalize the upload to indicate that all required files uploaded successfully.
Async intake setting¶
To enable the new multipart upload workflow with async enabled, configure the intakeSettings
for the localFile
adapter as shown in the following sample request:
Note
You can also use the async
intake setting independently of the multipart
setting.
{
"intakeSettings": {
"type": "localFile",
"multipart": true,
"async": false
}
}
A defining feature of batch predictions is that the scoring job starts on the initial file upload, and only one batch prediction job at a time can run for any given prediction instance. This functionality may cause issues when uploading large datasets over a slow connection. In these cases, the client's upload speed could create a bottleneck and block the processing of other jobs. To avoid this potential bottleneck, you can set async
to false
, as shown in the example above. This configuration postpones submitting the batch prediction job to the queue.
When "async": false
, the point at which a job enters the batch prediction queue depends on the multipart
setting:
-
If
"multipart": true
, the job is submitted to the queue after thePOST
request forfinalizeMultipart
resolves. -
If
"multipart": false
, the job is submitted to the queue after the initial file intakePUT
request resolves.
Example multipart upload requests¶
The batch prediction API requests required to upload a 3 part multipart batch prediction job would be:
PUT /api/v2/batchPredictions/:id/csvUpload/part/0/
PUT /api/v2/batchPredictions/:id/csvUpload/part/1/
PUT /api/v2/batchPredictions/:id/csvUpload/part/2/
POST /api/v2/batchPredictions/:id/csvUpload/finalizeMultipart/
Each uploaded part is a complete CSV file with a header.
Abort a multipart upload¶
If you start a multipart upload that you don't want to finalize, you can use a DELETE
request to the existing batchPredictions
abort route:
DELETE /api/v2/batchPredictions/:id/