For the API Client and HTTP Interface options, use the same dataset format as the original dataset used to create the project (i.e., upload a ZIP archive with one or more images). For the CLI Interface option use the base64 format (described below).
If your training dataset consists of a ZIP archive with one or more image files, the prediction dataset needs to be converted to a different format so that it is fully contained in a single CSV file.
To convert a set of image files into a single CSV file, each image must be converted to base64 text. This format allows DataRobot to embed images as a regular text column in the CSV. Encoding binary image data into base64 is a simple operation, present in all programming languages.
Here is an example in Python:
importbase64importpandasaspdfromioimportBytesIOfromPILimportImagedefimage_to_base64(image:Image)->str:img_bytes=BytesIO()image.save(img_bytes,'jpeg',quality=90)image_base64=base64.b64encode(img_bytes.getvalue()).decode('utf-8')returnimage_base64# let's build a CSV with a single row that contains an image# the same general approach works if you have multiple image rows or columnsimage=Image.open('cat.jpg')image_base64=image_to_base64(image)df=pd.DataFrame({'animal_image':[image_base64]})df.to_csv('prediction_dataset.csv'index=False)print(df)
Note
Encode a binary image file (not decoded pixel contents) to base64. This example uses PIL.Image to open the file, but you can base64-encode an image file directly.