Work with AI Catalog assets¶
Data assets within the AI Catalog can be one of the following:
- Materialized snapshots of tables/views, meaning DataRobot has pulled from the data asset and is currently keeping a copy of it in the catalog.
- Dynamic connections, meaning that the whole dataset is ingested from your data source when you create a modeling project from it, thus allowing you to work with the most up-to-date data.
If the data is snapshotted, those snapshots can be automatically refreshed periodically, and are also automatically versioned to preserve dataset lineage and enhance the overall governance capabilities of DataRobot.
Additionally, when Composable ML is enabled, you can save blueprints to the AI Catalog. From the catalog, a blueprint can be edited, used to train models in compatible projects, or shared.
The following sections describe tools for working with the catalog assets:
- Understand asset states.
- Update asset details (name, tags, and descriptions)
- Manage feature lists
- View configured relationships
- View version history and add comments
- Share and delete assets
- Perform bulk actions on assets
To add assets, see Import and create projects in the AI Catalog.
Find existing assets¶
Once in the AI Catalog, there are a variety of tools to help quickly locate the data assets you want to work with. You can:
Search and filter datasets using the search query box. Additionally, you can use the dropdown to modify the order of returned results (relevance by default).
You can filter by selecting one or more asset types (sources):
Filter by tag or asset owner:
DataRobot adds badges to catalog entries to indicate the state of the dataset. Either:
|Dynamic||A dataset that has no snapshot.|
|Spark||A dataset built from a Spark query.|
|Snapshot||A dataset that has a snapshot.|
|Static||A static file or URL-based dataset with a snapshot.
Datasets uploaded using data stages also display the STATIC badge, however, the FROM field displays
Versioning static assets
Static assets can only be versioned by uploads of the same type; datasets created by local files are versioned from local file uploads, and datasets created from a data stage are versioned from data stage uploads.
When you click to select an asset, DataRobot displays the asset's details. Click the pencil icons () to change the asset name, add a description, or add tags to aid in filtering. To change the name or description, click in the box to enter or delete text and click anywhere outside of the box to save the change. To add tags, click in the tag box and begin typing:
DataRobot offers any predefined tags that match the characters you entered. Select one by clicking or continue typing to add a new tag of alphanumeric characters (special characters and symbols are invalid). Either click outside the entry box or in the dropdown to add the tag.
View asset data¶
When you add a dataset, DataRobot ingests the source data and runs EDA1 to register the asset and make it available from the catalog:
The Profile tab allows you to preview dataset column names and row data. It can be useful for finding or verifying column names when writing Spark SQL statements for blended datasets. Note that the preview is a random sample of up to 1MB of the data and may be ordered differently from the original data. To see the complete, original data, use the Download Dataset option.
To preview a dataset, select it in the main catalog and click the pencil icon () to access dataset information (if available).
Click the Profile tab to preview the datasets contents:
Use the Columns dropdown to select the number of columns to display on the page and the scroll bars to scroll through those columns. Additionally, you can use the Rows dropdown to cycle through available data, 20 rows at a time.
The Profile tab also displays details for all features in the dataset. To view details for a particular feature, scroll to it in the display and click. The Feature Details listed in the right panel update to reflect statistics for the feature. (These are the same statistics as those displayed on the Data for EDA1.)
Work with feature lists¶
You can create new lists and feature transformations for features of any dataset in the catalog. To work with the tools, select the dataset in the main catalog and Feature Lists in the left panel.
When you create feature lists, they are copied to the project when you create a project. You can then set the list to use for the project from the Feature List dropdown at the top of the Project Data list. See the section on working with Feature Lists for complete details on creating, modifying, and understanding these lists.
The Feature List link also provides access to a tool for creating variable type feature transformations. While DataRobot bases variable type assignment on the values seen during EDA, there are times when you may need to change the type. Refer to feature transformations documentation for complete details.
To create a feature list:
Use the checkboxes to the left of feature names to select a set of features.
Click the Create new feature list from selection link, which becomes active after you select the first feature.
In the resulting dialog, provide a name for the new list and click Submit. The new list becomes available through the dropdown.
You can delete or rename any feature list you created. You cannot make any changes to the DataRobot default feature lists.
View configured relationships¶
DataRobot’s Feature Discovery capability guides you through creating relationships, which define both the included datasets and how they are related to one another. The end product is a multitude of additional features that are a result of these linkings. The Feature Discovery engine analyzes the included datasets to determine a feature engineering “recipe” and from that recipe generates secondary features for training and predictions. Once these relationships are established, you can view them from the catalog.
To view relationships, select the dataset in the main catalog and click the Relationships link to view, modify, or delete existing relationships:
See complete details of working with relationships before modifying relationship details.
View version history¶
Use the Version History link to list all versions of a selected dataset. The Status column indicates the snapshot status—green if successful, red if failed, gray if the original version did not have a snapshot.
Click a version to select it. Once selected, you can create a project from the version and download or delete the contents.
With the Comments link, you can add comments to—even host a discussion around—any item in the catalog that you have access to. Comment functionality is available in the AI Catalog (illustrated below), and also as a model tab from the Leaderboard and in use case tracking. With comments you can:
- Tag other users in a comment; DataRobot will then send them an email notification.
- Edit or delete any comment you have added (you cannot edit or delete other users' comments).
In the AI Catalog, there are a number of ways you can interact with data assets, including downloading, sharing, and deleting datasets.
To download a dataset, select it from the catalog list. From the dropdown menu in the upper right, select Download Dataset () and in the resulting dialog, browse to a download location and click Save.
Only snapshotted datasets can be downloaded. Additionally, there is a 10GB file size limit; attempting to download a dataset larger than 10GB will fail.
Assets in the AI Catalog can be shared to users, groups, and organizations.
|Allow sharing||The user you're sharing with can share the asset with other users.|
|Can use data||The user you're sharing with can an use the data. How they use the data depends on their role.|
|User list||Enter the user(s) with whom you are sharing the asset.|
|Access level||Select from Users, by default. If your instance has Groups and Organizations configured, you can select from these categories.|
|Role||Select a role for the users, groups, or organizations that are sharing the asset:
|Share||Select Share to perform the operation.|
|Shared with||Shows the users, groups, and organizations that asset is shared with, along with their permission settings.|
Sharing with multiple users
When sharing a catalog asset with multiple users, DataRobot suggests creating a user group first, and then sharing with that group instead of individual users.
The catalog uses the same sharing window as other places in the application, with some fields specific to the data assets.
To delete a dataset, select the dataset from the catalog list. From the dropdown menu in the upper right, select Delete Dataset (). When prompted for confirmation, click Delete.
Bulk actions on datasets¶
You can share, tag, or delete multiple datasets at once using the bulk action functionality in the AI Catalog. Start by selecting the box next to the asset(s) you want to manage; select at least one asset to enable the bulk actions at the top. A counter also displays how many assets are actively selected.