Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

AI Catalog

The AI Catalog is a centralized collaboration hub for working with data and related assets. It enables seamlessly finding, sharing, tagging, and reusing data, helping to speed time to production. The catalog provides easy access to the data needed to answer a business problem while ensuring security, compliance, and consistency.

The AI Catalog is comprised of three key functions:

  • Ingest: Data is imported into DataRobot and sanitized for use throughout the platform.
  • Storage: Reusable data assets are stored, accessed, and shared—allowing you to share data without sharing projects, decreasing risks and costs around data duplication.
  • Data Preparation: Clean, blend, transform, and enrich your data by leveraging SQL scripts for pinpointed results.

The catalog also supports data security and governance, which reduces friction and speeds up model adoption, through selective addition to the catalog, role-based sharing, and an audit trail.

Topic Describes...
Import datasets
Import data and create projects Import data into the AI Catalog and from there, create a DataRobot project.
Interact with catalog assets
Work with catalog assets View and modify asset details, create snapshots, and create projects from a data entry.
Manage catalog assets Share, delete, and download data assets.
Schedule data snapshots Set up schedules for data snapshots in the AI Catalog to keep a dataset in sync with its source data.
Prepare data
Prepare data with SparkSQL Enrich, transform, shape, and blend together datasets using Spark SQL queries within the AI Catalog.

Self-Managed AI Platform: Elasticsearch

For Self-Managed AI Platform users, DataRobot recommends enabling Elasticsearch for significantly improved search matches, relevancy, and rankings. Contact your DataRobot representative for help configuring and deploying Elasticsearch.


Updated January 26, 2024