As with any DataRobot project, building Visual AI models involves preparing and uploading data:
- Preparing the dataset, with or without additional features types.
- Creating projects from the AI Catalog or via local file upload.
- Reviewing the data before building.
Once you have built models as you would with any DataRobot project, you can:
Train-time image augmentation is a processing step that randomly transforms existing images, augmenting the training data. You can configure augmentation both before and after model building.
Prepare the dataset¶
When creating projects with Visual AI, data is provided to DataRobot in a ZIP archive. There are two mechanisms for identifying image locations within the archive:
- Using a CSV file that contains paths to images (works for all project types).
- Using one folder for each image class and file-system folder names as image labels (works for a single-image feature classification dataset).
Before beginning, verify that images meet the size and format guidelines. Once created, you can share and preview the dataset in the AI Catalog.
Paths for image uploads¶
Use a CSV for any type of project (regression or classification), both a straight class-and-image and when you want to add features to your dataset. With this method, you provide images in the same directory as the CSV in one of the following ways:
- Create a single folder with all images.
- Separate images into folders.
- Include the images, outside of folders, alongside the CSV.
To set up the CSV file:
Create a CSV in the same directory as the images with, at a minimum, the following columns:
- target column
- relative path to each image
Include any additional features.
If you have multiple images for a row, you can create an individual column in the dataset for each. If your images are categorized for example the front, back, left, and right of a healthy tomato plant, best practice suggests creating one column for each category (one column for front images, one for back images, one for left images, and one for right). If there is not an image in each row of an added column, DataRobot treats it as a missing value.
Create a ZIP archive of the directory and drag-and-drop it into DataRobot to start a project or add it to the AI Catalog.
Folder-based image datasets¶
When adding only images, prepare your data by creating a folder for each class and putting images into the corresponding folders. For example, the classic "is it a hot dog?" classification would look like this, with a folder containing images of hot dogs and a folder of non hot dog images:
Once image collection is complete, ZIP the folders into a single archive and upload the archive directly into DataRobot as a local upload or add it to the AI Catalog.
Create projects from the AI Catalog¶
It is common to access and share image archives from the AI Catalog, where all tabs and catalog functionality are the same for image and non-image projects. The AI Catalog helps to get a sense of image features and check whether everything appears as expected before you begin model building.
To add an archive to the catalog:
Use the Local File option to upload the archive. When the dataset has finished registering, a banner indicates that publishing is complete.
Select the Profile tab to see a sample for each image class.
Click on a sample image to display unique and missing value statistics for the image class.
Click the Preview Images link to display 30 randomly selected images from the dataset.
Next, review your data before building models.
Review data before building¶
After EDA1 completes, whether initiated from the AI Catalog or drag-and-drop, DataRobot runs data quality checks, identifies column types, and provides a preview of images for sampling. Confirm on the Data page that DataRobot processed dataset features as
After previewing images and data quality, as described below, you can build models using the regular workflow, identifying
class as the target.
Data quality checks¶
Visual AI uses the Data Quality Assessment tool, with specific checks in place for images. After EDA1 completes, access the results from the Data page:
If images are missing, a dedicated section reports the percent missing as well as provides access to a log that provides more detail. "Missing" images include those with bad or unresolved paths (file names that don't exist in the archive) or an empty cell in the column expecting an image path. Click Preview log to open a modal showing per-image detail.
Data page checks¶
From the Data page do the following to ensure that image files are in order:
- Confirm that DataRobot has identified images as Var Type
imagerow in the data table to open the image preview, a random sample of 30 images from the dataset (the full dataset will be used for training). The preview confirms that the images were processed by DataRobot and also allows you to confirm that it is the image set you intended to use.
Click View Raw Data to open a modal displaying up to a 1MB random sample of the raw data DataRobot will be using to build models, both images and corresponding class.
Review data after building¶
After you have built a project using the standard workflow, DataRobot provides additional information from the Data page.
image feature and click Image Preview. This visualization initially displays one sample for each class in your dataset. Click a class to display more samples for that class:
Click the Duplicates link to view whether DataRobot detected any duplicate images in your dataset. Duplicates are reported for:
- the same filename in more than one row of the dataset
- two images with different names but, as determined by DataRobot, exactly the same content
Use the same prediction tools with Visual AI as with any other DataRobot project. That is, select a model and make predictions using either Make Predictions or Deploy. The requirements for the prediction dataset are the same as those for the modeling set.
Refer to the section on image predictions for more details.