Explore data¶
The Data tab lists all datasets and recipes currently linked to the selected Use Case. From this tab, you can manage your assets and launch various data actions:
Element | Description | |
---|---|---|
1 | Add Data | Click to open the Add data modal, allowing you to add datasets to the current Use Case. |
2 | Search | Search for a specific dataset. |
3 | Asset type icons | Each asset is preceded by one of the following icons:
|
4 | Actions menu | Click the Actions menu to interact with a data asset. For datasets you can:
|
5 | Sort | Sort the dataset columns. |
Preview
The following updates to EDA insights in Workbench are preview, on by default:
- On the data explore page, the tabs for Features, Feature lists, Data preview, and Info now appear as icons in the left panel.
- The data explore page includes the Info view, which displays additional information about the dataset, including a feature summary and, if applicable, Feature Discovery details.
- An updated footer for the Features and Data preview views.
- Wrangled datasets display recipe operations in the right panel.
- In the Features view, click specific features to view insights. The available insights depend on the feature type.
- When viewing insights for a summarized categorical feature, you can show a histogram for a selected key.
Feature flag: Enable EDA insights in Workbench
Preview
Support for dynamic datasets in Workbench is on by default.
When this feature is enabled:
- Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
- Dynamic datasets added via a connection will be available for selection in the Data Registry.
- DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.
Feature flag: Enable Dynamic Datasets in Workbench
Data explore page¶
While a dataset is being registered in Workbench, DataRobot also performs exploratory data analysis (EDA1)—analyzing and profiling every feature to detect feature types, automatically transform date-type features, and assess feature quality. Once registration is complete, you can explore the information uncovered while computing EDA1.
To open the data explore page, click the Actions menu next to the dataset you want to view and select Explore. Alternatively, click the dataset name to view its insights.
Use the icons on the left to navigate between the following pages which provide summary information and EDA insights for the dataset:
Displays summary information for the dataset version you're currently viewing.
Displays a preview of the dataset you're currently viewing. If the dataset is dynamic, you can view an interactive sample, in which case DataRobot displays a random sampling of the raw data. You can specify the sampling method and number of rows in the right panel under Interactive sample. This option is not available for snapshot datasets.
Click on a feature's histogram to open an expanded version with additional insights and information.
Displays each feature within the selected feature list. Click on a feature to view additional information, including summary metrics and frequent values. The available insights are based on the variable type of the feature.
Displays all DataRobot and custom feature lists for the dataset. From here you can create new feature lists, as well as manage existing ones.
Dataset versioning¶
The data explore page supports dataset versioning, allowing you to access a history of data snapshots as well as create new snapshots from the same page. Note that you can can access dataset versioning from any view on the data explore page.
To access dataset versions, click the dropdown next to Data actions or open Dataset Versions in the right panel.
Element | Description | |
---|---|---|
1 | Snapshot policy | Displays the selected dataset version. If the snapshot version is selected, DataRobot displays the date and time of the snapshot creation. Click the dropdown to access the following:
|
2 | Dataset Versions | Displays a version history of the dataset. Click a dataset to view a different version. |
3 | + Create snapshot / Upload new version | Allows you to add additional versions of the dataset, and after registration is complete, the new dataset is displayed in the version history. Additionally, it is added to the Use Case and Data Registry.
|
Data actions for snapshot policies
The data explore page supports the following snapshot policies:
- Dynamic: DataRobot is connected to the data source and uses live data to perform the selected data action.
- Snapshot: A fixed snapshot that is stored in DataRobot and used to perform the selected data action. This policy is recommended for repeatable experimentation if live data often changes.
- Static: A local file used to perform the selected data action.
Dataset actions¶
You can perform the following actions on the data explore page (note that these actions persist no matter what view is currently selected):
Element | Description | |
---|---|---|
1 | Dataset name | To rename the dataset, click on its name. To save your changes, click outside of the text field. |
2 | Data actions | Open the Data actions dropdown to perform one of the following actions with the dataset you're currently viewing:
|
3 | Data Versions actions | Under Dataset Versions, click the Actions menu to perform one of the following actions on a specific snapshot dataset:
|
Feature lists¶
Preview
Support for feature lists in Workbench is a preview feature, on by default.
Feature flags: Enable Data and Feature Lists tabs in Workbench, Enable Workbench Feature List Creation
After adding a dataset to your Use Case, DataRobot generates feature lists as part of EDA. Feature lists control the subset of features that DataRobot uses to build models and make predictions. Each model has a feature list associated to it.
You might want to use feature lists to:
- Remove features that cannot be used in the model for any reason, for example, a feature that is causing target leakage.
- Make predictions faster by removing unimportant features (i.e., ones that don't improve the model's performance).
You can use one of the automatically created lists—Informative and Raw—or create a custom feature list.
View feature lists¶
The Feature list view, accessed by the icon on the left-panel of the data explore page, is a dedicated page where you can view, manage, and create custom feature lists.
To control the columns displayed here, click Settings and select the box next to the columns you want to view. Then, click Apply.
Before setting up an experiment, use exploratory data insights to explore different feature lists before choosing the appropriate one to use for modeling.
Create a feature list¶
To create a custom feature list:
-
While exploring a dataset, click the dropdown at the top of the page and select + Create feature list. This opens the Feature lists view.
-
Select the box next to each feature you want to include in your custom list. Then, enter a name and description (optional) for the new feature list.
-
When you're done, click Create feature list in the top-right corner. You can now access the new feature list in the Feature list view.
Next steps¶
From here, you can:
- Add more data.
- Perform data wrangling for datasets added via a data connection.
- Use the dataset to set up an experiment and start modeling.