The catalog enables seamlessly finding, sharing, tagging, and reusing data, helping to speed time to production and increase collaboration. It provides easy access to the data needed to answer a business problem while ensuring security, compliance, and consistency. This section details:
|Load data/create projects||Starting a project from the AI Catalog.|
|Work with Assets||Working with existing assets.|
|Schedule snapshots||Scheduling and renewing snapshots.|
|Prepare data with Spark SQL||Blending datasets with Spark SQL.|
For on-premise users, the AI Catalog recommends enabling Elasticsearch for significantly improved search matches, relevancy, and rankings. Contact your DataRobot representative for help configuring and deploying Elasticsearch.
The AI Catalog is a centralized collaboration hub for working with data and related assets. The DataRobot landing page provides the option to start a project via the legacy method or by using the AI Catalog.
- execute simple data preparation, leveraging SQL scripts for pinpointed results
- create datasets without the full commitment of creating projects
- find, access, delete, and reuse the assets you need
- share data without sharing projects, decreasing risks and costs around data duplication
- dramatically improve time-to-prediction through direct use of prepared and featurized datasets, which are then available to DataRobot's Batch Prediction functionality
- support data security and governance, which reduces friction and speeds up model adoption, through selective addition to the catalog, role-based sharing, and an audit trail