Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Build models

As with any DataRobot project, building Visual AI models involves preparing and uploading data:

  1. Preparing the dataset, with or without additional features types.
  2. Creating projects from the AI Catalog or via local file upload.
  3. Reviewing the data before building.

Once you have built models as you would with any DataRobot project, you can:

  1. Review the data after building.
  2. Evaluate and fine-tune models.
  3. Make predictions.


Train-time image augmentation is a processing step that randomly transforms existing images, augmenting the training data. You can configure augmentation both before and after model building.

Prepare the dataset

When creating projects with Visual AI, you can provide data to DataRobot in a ZIP archive. There are two mechanisms for identifying image locations within the archive:

  1. Using a CSV file that contains paths to images (works for all project types).
  2. Using one folder for each image class and file-system folder names as image labels (works for a single-image feature classification dataset).


Additionally, you can encode image data and provide the encoded strings as a column in the CSV dataset. Use base64 format to the encode images before registering the data in DataRobot. (Any other encoding format or encoding error will result in model errors.) See this tutorial for access to a script for converting images and additionally information on how to make predictions on Visual AI projects with API calls.

Before beginning, verify that images meet the size and format guidelines. Once created, you can share and preview the dataset in the AI Catalog.

Size and format guidelines

The following table describes image requirements:

Support Type
File types .jpeg, .jpg, .png, .bmp, .ppm, .gif, .mpo, and .tiff/.tif
Bit support 8-bit, 16-bit*
Pixel size
  • Images up to 2160x2160 pixels are accepted and are downsized to 224x224 pixels.
  • Images smaller than 224x224 are upsampled using Lanczos resampling.
* How are 16-bit images handled

DataRobot supports 16-bit images by converting the image internally to three 8-bit images (3x8-bit). Because TIFF images are processed by taking the first image, the resulting 16-bit image is essentially a greyscale image, which DataRobot then rescales. For more detail, see the Pillow Image Module documentation.

Paths for image uploads

Use a CSV for any type of project (regression or classification), both a straight class-and-image and when you want to add features to your dataset. With this method, you provide images in the same directory as the CSV in one of the following ways:

  • Create a single folder with all images.
  • Separate images into folders.
  • Include the images, outside of folders, alongside the CSV.

To set up the CSV file:

  1. Create a CSV in the same directory as the images with, at a minimum, the following columns:

    • target column
    • relative path to each image

  2. Include any additional features.

If you have multiple images for a row, you can create an individual column in the dataset for each. If your images are categorized for example the front, back, left, and right of a healthy tomato plant, best practice suggests creating one column for each category (one column for front images, one for back images, one for left images, and one for right). If there is not an image in each row of an added column, DataRobot treats it as a missing value.

Create a ZIP archive of the directory and drag-and-drop it into DataRobot to start a project or add it to the AI Catalog.

Folder-based image datasets

When adding only images, prepare your data by creating a folder for each class and putting images into the corresponding folders. For example, the classic "is it a hot dog?" classification would look like this, with a folder containing images of hot dogs and a folder of non hot dog images:

Once image collection is complete, ZIP the folders into a single archive and upload the archive directly into DataRobot as a local upload or add it to the AI Catalog.

Create projects from the AI Catalog

It is common to access and share image archives from the AI Catalog, where all tabs and catalog functionality are the same for image and non-image projects. The AI Catalog helps to get a sense of image features and check whether everything appears as expected before you begin model building.

To add an archive to the catalog:

  1. Use the Local File option to upload the archive. When the dataset has finished registering, a banner indicates that publishing is complete.

  2. Select the Profile tab to see a sample for each image class.

  3. Click on a sample image to display unique and missing value statistics for the image class.

  4. Click the Preview Images link to display 30 randomly selected images from the dataset.

  5. Click Create project to kick off EDA1 (for materialized datasets).

Next, review your data before building models.

Review data before building

After EDA1 completes, whether initiated from the AI Catalog or drag-and-drop, DataRobot runs data quality checks, identifies column types, and provides a preview of images for sampling. Confirm on the Data page that DataRobot processed dataset features as class and image:

After previewing images and data quality, as described below, you can build models using the regular workflow, identifying class as the target.

Data quality checks

Visual AI uses the Data Quality Assessment tool, with specific checks in place for images. After EDA1 completes, access the results from the Data page:

If images are missing, a dedicated section reports the percent missing as well as provides access to a log that provides more detail. "Missing" images include those with bad or unresolved paths (file names that don't exist in the archive) or an empty cell in the column expecting an image path. Click Preview log to open a modal showing per-image detail.

Data page checks

From the Data page do the following to ensure that image files are in order:

  1. Confirm that DataRobot has identified images as Var Type image.
  2. Expand the image row in the data table to open the image preview, a random sample of 30 images from the dataset (the full dataset will be used for training). The preview confirms that the images were processed by DataRobot and also allows you to confirm that it is the image set you intended to use.

  3. Click View Raw Data to open a modal displaying up to a 1MB random sample of the raw data DataRobot will be using to build models, both images and corresponding class.

Review data after building

After you have built a project using the standard workflow, DataRobot provides additional information from the Data page.

Expand the image feature and click Image Preview. This visualization initially displays one sample for each class in your dataset. Click a class to display more samples for that class:

Click the Duplicates link to view whether DataRobot detected any duplicate images in your dataset. Duplicates are reported for:

  • the same filename in more than one row of the dataset
  • two images with different names but, as determined by DataRobot, exactly the same content


Use the same prediction tools with Visual AI as with any other DataRobot project. That is, select a model and make predictions using either Make Predictions or Deploy. The requirements for the prediction dataset are the same as those for the modeling set.

Refer to the section on image predictions for more details.

Updated August 8, 2022
Back to top