Skip to content

Add data to Registry

When ingesting data through the Data Registry, DataRobot completes EDA1 (for materialized, or static assets) as part of the registration process and saves the results to reuse later.

To add data from the Data Registry, click the Add data dropdown and select one of the following methods:

Method Description
Data connection Add data from an existing data connection or configure and add data from a new one.
Local file Browse and upload a file from your local file system.
URL Adds a snapshot of the full dataset specified in the URL.

You can also upload calendar files for time series experiments using any of the above methods.

Dataset formatting

To avoid introducing unexpected line breaks or incorrectly separated fields during data import, if a dataset includes non-numeric data containing special characters—such as newlines, carriage returns, double quotes, commas, or other field separators—ensure that those instances of non-numeric data are wrapped in quotes ("). Properly quoting non-numeric data is particularly important when the preview feature "Enable Minimal CSV Quoting" is enabled.

Data connection

In Registry, you can connect to and add data from a data connection by clicking Add data > Data connection.

Creating a data connection lets you explore external source data—from both connectors and JDBC drivers—and then add it to Registry.

When you create or reconfigure a connection in one area of DataRobot, those updates are also applied across Workbench, Registry, and the Data Connections page (i.e., any area where you work with data connectivity).

See also:

Local file

This method of adding data is a good approach if your dataset is already ready for modeling.

Before you proceed, review DataRobot's dataset requirements for accepted file formats and size guidelines. See the associated considerations for important additional information.

After selecting Local file from the Add data dropdown, locate and select your dataset in the file explorer. Then, click Open.

Supported file types

NextGen supports the following file types for upload: .csv, .tsv, .dsv, .xls, .xlsx, .sas7bdat, .geojson, .gz, .bz2, .tar, .tgz, .zip.

URL

You can use a local, HTTP, HTTPS, Google Cloud Storage, Azure Blob Storage, or S3 (URL must use HTTP) URL to import your data. To use a local file, specify the URL as follows: file:///local/file/location.

After selecting URL from the Add data dropdown, enter the URL in the field and click Save.

Note

When importing a data using a URL, DataRobot registers a snapshot of the full dataset.

Calendars for time series

Calendars for time series experiments can be uploaded directly to the Data Registry using any of the upload methods. Calendars uploaded as a local file are automatically added to the Data Registry, where they can then be shared and downloaded.

Next steps

From here you can: