Data can be ingested into DataRobot from your local system, a URL, Hadoop (if deployed in a Hadoop environment), and through data connections to common databases and data lakes. A critical part of the data ingestion process is Exploratory Data Analysis (EDA). EDA happens twice within DataRobot, once when data is ingested and again once a target has been selected and modeling has begun.
You can import data directly into the DataRobot platform or you can import into the AI Catalog, a centralized collaboration hub for working with data and related assets. The catalog allows you to seamlessly find, understand, share, tag, and reuse data. The following sections provide guidelines and steps for importing data.
|Dataset requirements||Dataset requirements, data type definitions, file formats and encodings, and special column treatments.|
|Import directly into DataRobot||In DataRobot, you can import a dataset file, import from a URL, import from AWS S3, among other methods.|
|Import and create projects in the AI Catalog||Import data into the AI Catalog and from there, create a DataRobot project. In the catalog, you can transform the data using SQL, and create and schedule snapshots of your data.|
|Catalog assets||View, modify, and share assets and metadata.|
|Schedule data snapshots||Set up schedules for data snapshots in the AI Catalog.|
|Large datasets||Methods of working with large datasets (greater than 10GB).|