Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Manage data with the AI Catalog

DataRobot’s AI Catalog is comprised of three key functions:

  • Ingest: Data is imported into DataRobot and sanitized for use throughout the platform.
  • Storage: Reusable data assets are stored, accessed, and shared.
  • Data Preparation: Clean, blend, transform, and enrich your data to maximize the effectiveness of your application.

You can access the AI Catalog from anywhere within DataRobot by clicking the AI Catalog tab at the top of the brower.

Takeaways

This tutorial shows you how to:

  • Add data to the AI Catalog.
  • View information about a dataset.
  • Blend a dataset with another dataset using Spark SQL.
  • Create a project.

Add data

To add data to the AI Catalog:

  1. Click AI Catalog at the top of DataRobot window.

  2. Click Add to catalog and select an import method.

    The following table describes the methods:

    Method Description
    New Data Connection Configure a JBDC connection to import from an external database of data lake.
    Existing Data Connection Select a configured data source to import data. Select the account and the data you want to add.
    Local File Browse to upload a local dataset or drag and drop a dataset.
    URL Import by specifying a URL.
    Spark SQL Use Spark SQL queries to select and prepare the data you want to store.

DataRobot registers the data after performing an initial exploratory data analysis (EDA1). Once registered, you can do the following:

View information about a dataset

Click a dataset in the catalog to view information about it.

Element Description
Asset tabs Select a tab to work with the asset (dataset):
  • Info: View and edit basic information about the dataset. Update the name and description, and add tags to use for searches.
  • Profile: Preview dataset column names and row data.
  • Feature Lists: Create new feature lists and transformations from the dataset.
  • Relationships: View relationships configured during Feature Discovery.
  • Version History: List and view status for all versions of the dataset. Select a version to create a project or download.
  • Comments: Add a comment to a dataset. Tag users in your comment and DataRobot sends them an email notification.
Dataset Info Update the name and description, and add tags to use for searches. The number of rows and features display on the right, along with other details.
State badges Displayed badges indicate the state of the asset—whether it's in the process of being registered, whether it's static or dynamic, generated from a Spark SQL query, or snapshotted.
Create project Create a machine learning project from the dataset.
Share Share assets with other users, groups, and organizations.
actions menu Download, delete, or create a snapshot of the dataset.
Renew Snapshot Add a scheduled snapshot.

Blend a dataset using Spark SQL

You can blend two or more datasets and use Spark SQL to select and transform features.

  1. In the catalog, click Add to catalog and select Spark SQL.

  2. Click Add data.

  3. Select the tables you want to blend and click Add selected data.

  4. For each dataset, click the actions menu and click Select Features.

  5. Choose the features and click Add selected features to SQL. You can click the right arrows to add features one at a time.

  6. Once you have added features from the datsets, add SQL commands to the editing window to generate a query (click Spark Docs on the upper right for Spark SQL documentation). Try out the query by clicking Run.

  7. Click Save when you have the results you want. DataRobot registers the new dataset.

Create a project

Click a registered dataset in the catalog and click Create project. DataRobot uploads the data, conducts exploratory data analysis, and creates the machine learning project. You can then start building models.

Learn more

Documentation:


Updated March 10, 2023
Back to top