When you open the Add data modal, by default, DataRobot displays the Data Registry, a central catalog for your datasets that lists all static and snapshot datasets you currently have access to in the AI Catalog, including those uploaded from local files and data connections in Workbench. This method of adding data is a good approach if your dataset is already ready for modeling.
When you add a dataset from the registry, you're creating a link from the Use Case to the source of that dataset, meaning datasets can have a one-to-many relationship with Use Cases, so when a dataset is removed, you're only removing the link; any experiments created from the dataset will not be affected.
See the associated considerations for important additional information.
Add a dataset¶
You can add any static or snapshot datasets that have been previously registered in DataRobot.
Support for dynamic datasets in Workbench is on by default.
When this feature is enabled:
- Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
- Dynamic datasets added via a connection will be available for selection in the Data Registry.
- DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.
Feature flag: Enable Dynamic Datasets in Workbench
To add a dataset:
In the Data Registry, select the box to the left of the dataset you want to view.
(Optional) Preview the dataset to determine if the dataset is appropriate for the objective of your Use Case by clicking Preview.
Click Add to Use Case in the upper-right corner.
Workbench opens to the Datasets tab of your Use Case.
Preview a dataset¶
Viewing a snapshot preview allows you to confirm that a dataset is appropriate for your Use Case before adding it.
To preview a dataset:
In the Data Registry, select the box to the left of the dataset you want to view and click Preview in the upper-right corner.
Analyze the dataset using the Features and Data preview buttons:
Lists the feature name, type, number of unique values, and number of missing values for each feature in the dataset.
Displays a random sample, up to 1MB, of the raw data table.
Determine if the dataset suits your Use Case, and then either:
- Add the dataset to your Use Case by clicking Add to Use Case.
- Go back to the Data Registry by clicking Data Registry in the breadcrumbs below the dataset name.
From here, you can:
- Add more data.
- View exploratory data insights for the dataset.
- Use the dataset to set up an experiment and start modeling.
To learn more about the topics discussed on this page, see: