# Load data

> Load data - How to add external data using JDBC or a SQL query, configure fast registration, and
> upload calendars for time series projects.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.534909+00:00` (UTC).

## Primary page

- [Load data](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html): Full documentation for this topic (HTML).

## Sections on this page

- [From external connections](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#from-external-connections): In-page section heading.
- [From a SQL Query](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#from-a-sql-query): In-page section heading.
- [Configure fast registration](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#configure-fast-registration): In-page section heading.
- [Upload calendars](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#upload-calendars): In-page section heading.
- [Personal data detection](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#personal-data-detection): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [Data](https://docs.datarobot.com/en/docs/classic-ui/data/index.html): Linked from this page.
- [AI Catalog](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/index.html): Linked from this page.
- [EDA1](https://docs.datarobot.com/en/docs/reference/data-ref/eda-explained.html#eda1): Linked from this page.
- [materialized](https://docs.datarobot.com/en/docs/reference/glossary/index.html#materialized): Linked from this page.
- [Configure a JBDC connection](https://docs.datarobot.com/en/docs/classic-ui/data/connect-data/data-conn.html): Linked from this page.
- [Select a configured data source](https://docs.datarobot.com/en/docs/classic-ui/data/import-data/import-to-dr.html#import-from-a-data-source): Linked from this page.
- [View information](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog-asset.html#view-asset-information): Linked from this page.
- [Blend the dataset](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/spark.html#create-blended-datasets): Linked from this page.
- [manage](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/manage-asset.html): Linked from this page.
- [or use stored credentials](https://docs.datarobot.com/en/docs/platform/acct-settings/stored-creds.html): Linked from this page.
- [Calendars for time series](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/ts-reference/ts-adv-opt.html#calendar-files): Linked from this page.
- [feature list](https://docs.datarobot.com/en/docs/classic-ui/modeling/build-models/build-basic/feature-lists.html#automatically-created-feature-lists): Linked from this page.

## Documentation content

# Import data to the AI Catalog

Import methods are the same for both legacy and catalog entry—that is, via local file, HDFS, URL, or JDBC data source. From the catalog, however, you can also add by blending datasets with Spark. When uploading through the catalog, DataRobot completes [EDA1](https://docs.datarobot.com/en/docs/reference/data-ref/eda-explained.html#eda1) (for [materialized](https://docs.datarobot.com/en/docs/reference/glossary/index.html#materialized) assets), and saves the results for later re-use. For unmaterialized assets, DataRobot uploads and samples the data but does not save the results for later re-use. Additionally, you can [upload calendars](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#upload-calendars) for use in time series projects and enable [personal data detection](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog.html#personal-data-detection).

> [!NOTE] Dataset formatting
> To avoid introducing unexpected line breaks or incorrectly separated fields during data import, if a dataset includes non-numeric data containing special characters—such as newlines, carriage returns, double quotes, commas, or other field separators—ensure that those instances of non-numeric data are wrapped in quotes ( `"`). Properly quoting non-numeric data is particularly important when the preview feature "Enable Minimal CSV Quoting" is enabled.

To add data to the AI Catalog:

1. ClickAI Catalogat the top of DataRobot window.
2. ClickAdd to catalogand select an import method. The following table describes the methods: MethodDescriptionNew Data ConnectionConfigure a JBDC connectionto import from an external database of data lake.Existing Data ConnectionSelect a configured data sourceto import data. Select the account and the data you want to add.Local FileBrowse to upload a local dataset or drag and drop a datasetfor import.URLSpecify a URL.Spark SQLUseSpark SQL queries to select and prepare the datayou want to store.

DataRobot registers the data after performing initial exploratory data analysis ( [EDA1](https://docs.datarobot.com/en/docs/reference/data-ref/eda-explained.html#eda1)). Once registered, you can do the following:

- View information about a dataset, including its history.
- Blend the dataset with another dataset.
- Create an AutoML project .
- Use the additional tools to view, modify, and manage assets.

## From external connections

Using JDBC, you can read data from external databases and add the data as assets to the AI Catalog for model building and predictions. See [Data connections](https://docs.datarobot.com/en/docs/classic-ui/data/connect-data/data-conn.html) for more information.

1. If you haven't already,create the connectionsand add data sources.
2. Select theAI Catalogtab, clickAdd to catalog, and selectExisting Data Connection.
3. Click the connection that holds the data you would like to add.
4. Select an account. Enteror use stored credentialsfor the connection to authenticate.
5. Once validated, select a source for data. ElementDescription1SchemasSelectSchemasto list all schemas associated with the database connection. Select a schema from the displayed list. DataRobot then displays all tables that are part of that schema. ClickSelectfor each table you want to add as a data source.2TablesSelectTablesto list all tables across all schemas. ClickSelectfor each table you want to add as a data source.3SQL QuerySelect data for your project with aSQL query.4SearchAfter you select how to filter the data sources (by schema, table, or SQL query), enter a text string to search.5Data source listClickSelectfor data sources you want to add. Selected tables (datasets) display on the right. Click thexto remove a single dataset orClear allto remove all entries.6PoliciesSelect a policy:Create snapshot: DataRobot takes a snapshot of the data.Create dynamic: DataRobot refreshes the data for future modeling and prediction activities.
6. Once the content is selected, clickProceed with registration. DataRobot registers the new tables (datasets) and you can then create projects from them or perform other operations, like sharing and querying with SQL.

## From a SQL Query

You can use a SQL query to select specific elements of the named database and use them as your data source. DataRobot provides a web-based code editor with SQL syntax highlighting to help in query construction. Note that DataRobot’s SQL query option only supports SELECT-based queries. Also, SQL validation is only run on initial project creation. If you edit the query from the summary pane, DataRobot does not re-run the validation.

To use the query editor:

1. Once you have added data from anexternal connection, click theSQL querytab. By default, theSettingstab is selected.
2. Enter your query in the SQL query box.
3. To validate that your entry is well-formed, make sure that theValidate SQL Querybox below the entry box is checked. NoteIn some scenarios, it can be useful to disable syntax validation as the validation can take a long time to complete for some complex queries. If you disable validation, no results display. You can skip running the query and proceed to registration.
4. Select whether to create asnapshot.
5. ClickRunto create a results preview.
6. Select theResultstab after computing completes.
7. Use the window-shade scroll to display more rows in the preview; if necessary, use the horizontal scroll bar to scroll through all columns of a row:

When you are satisfied with your results, click Proceed with registration. DataRobot validates the query and begins data ingestion. When complete, the dataset is published to the catalog. From here you [can interact with the dataset](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog-asset.html) as with any other asset type.

For more examples of working with the SQL editor, see [Prepare data in AI Catalog with Spark SQL](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/spark.html).

## Configure fast registration

Fast registration allows you to quickly register large datasets in the AI Catalog by specifying the first N rows to be used for registration instead of the full dataset. This gives you faster access to data to use for testing and Feature Discovery.

To configure fast registration:

1. In theAI Catalog, clickAdd to catalogand select your data source. Fast registration is only available when adding a dataset from a new data connection, an existing data connection, or a URL.
2. In the resulting window, enter the data source information (in this example, URL).
3. Select the appropriate policy for your use case—eitherCreate snapshotorCreate dynamic. For both snapshot and dynamic policies, the AI Catalog dataset calculates EDA1 using only the specified number of rows, taken from the start of the dataset. For example, it calculates using the first 1,000 rows in the dataset above. Where the two policies differ is that if you consume the snapshot dataset (for example, using it to create a project), the consumer of the dataset will only see the specified number of rows when consuming it, but the consumer of the dynamic dataset will see the full set of rows rather than the partial set of rows.
4. Select the fast registration data upload option. For snapshot, selectUpload data partially, and for dynamic, selectUse partial data for EDA.
5. Specify the number of rows to use for data ingest during registration and clickSave.

## Upload calendars

[Calendars for time series](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/ts-reference/ts-adv-opt.html#calendar-files) projects can be uploaded either:

- Directly to the catalog with the Add to catalog button, using any of the upload methods. Calendars uploaded as a local file are automatically added to the AI Catalog , where they can then be shared and downloaded.
- From within the project using the Advanced options > Time Series tab.

To upload calendar files from Advanced options on the Time Series tab:

1. When adding fromAdvanced options, use thechoose filedropdown and chooseAI Catalog.
2. A modal appears listing available calendars, which was determined based on the content of the dataset. Use the dropdown to sort the listing by type. DataRobot determines whether the calendar is single or multiseries based on the number of columns. If two columns, only one of which is a date, it is single series; three columns with just one being a date makes it multiseries.
3. Click on any calendar file to see the associated details and select the calendar for use with the project. The calendar file becomes part of the standardAI Cataloginventory and can be reused like any dataset. Calendars generated fromAdvanced optionsare saved to the catalog where you can then download them, apply further customization, and re-upload them.

## Personal data detection

In some regulated and specific use cases, the use of personal data as a feature in a model is forbidden. DataRobot automates the detection of specific types of personal data to provide a layer of protection against the inadvertent inclusion of this information in a dataset and prevent its usage at modeling and prediction time.

After a dataset is ingested through the AI Catalog, you have the option to check each feature for the presence of personal data. The result is a process that checks every cell in a dataset against patterns that DataRobot has developed for identifying this type of information. If found, a warning message is displayed in the AI Catalog's Info and Profile pages, informing you of the type of personal data detected for each feature and providing sample values to help you make an informed decision on how to move forward. Additionally, DataRobot creates a new [feature list](https://docs.datarobot.com/en/docs/classic-ui/modeling/build-models/build-basic/feature-lists.html#automatically-created-feature-lists) —the equivalent of Informative Features but with all features containing any personal data removed. The new list is named Informative Features - Personal Data Removed.

> [!WARNING] Warning
> There is no guarantee that this tool has identified all instances of personal data. It is intended to supplement your own personal data detection controls.

DataRobot currently supports detection of the following fields:

- Email address
- IPv4 address
- US telephone number
- Social security number

To run personal data detection on a dataset in the AI Catalog, go to the Info page click Run Personal Data Detection on the banner that indicates successful dataset publishing:.

If DataRobot detects personal data in the dataset, a warning message displays. Click Details to view more information about the personal data detected; click Dismiss to remove the warning and prevent it from being shown again.

Warnings are also highlighted by column on the Profile tab:
