Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Add data to the Registry

When ingesting data through the Data Registry, DataRobot completes EDA1 (for materialized, or static assets) as part of the registration process, and saves the results to reuse later.

To add data from the Data Registry, click the Add data dropdown and select one of the following methods:

Method Description
Data connection Add data from an existing data connection or configure and add data from a new one.
Local file Browse and upload a file from your local file system.
URL Adds a snapshot of the full dataset specified in the URL.

You can also upload calendar files for time series experiments using any of the above methods.

Dataset formatting

To avoid introducing unexpected line breaks or incorrectly separated fields during data import, if a dataset includes non-numeric data containing special characters—such as newlines, carriage returns, double quotes, commas, or other field separators—ensure that those instances of non-numeric data are wrapped in quotes ("). Properly quoting non-numeric data is particularly important when the preview feature "Enable Minimal CSV Quoting" is enabled.

Data connection

When you add data from a data connection, you have two options:

Connection capabilities across NextGen

From the Data Registry page, you can add data from all supported data connections in DataRobot, however, you can only add and work with specific data connections within Workbench.

To add data from an existing connection:

  1. In the Data Registry, click Add data > Data connection.
  2. In the left panel, select a data connection that holds that data you want to add and click Add from connection. If the connection is authorized with OAuth, you may be prompted to sign in.

  3. Search for data by Schemas, Tables, or SQL query.

  4. Select the datasets you want to add, which appear in the right panel, and click Proceed to confirmation.

  5. Under Settings, select one of the following policies:

    DataRobot takes a snapshot of the data.

    If selected, choose one of the following data upload options:

    Setting Description
    Upload full data Instructs DataRobot to take a snapshot of the full dataset.
    Upload data partially Instructs DataRobot to take a snapshot using the first N number of rows for registration. You must specify the number of ingest rows.

    DataRobot refreshes the data for future modeling and prediction activities.

    If selected, choose one of the following data upload options:

    Setting Description
    Use full data Instructs DataRobot to display the full dataset from the data source.
    Use partial data for EDA Instructs DataRobot to use the first N number of rows to run EDA1. You must specify the number of ingest rows.

  6. Click Register.

To add data from a new connection:

  1. In the Data Registry, click Add data > Data connection.
  2. In the left panel, click + Add new connection.

  3. Select the data connection that holds the data you want to add.

  4. To configure a new data connection, see the documentation on data connections in NextGen.

Local file

This method of adding data is a good approach if your dataset is already ready for modeling.

Before you proceed, review DataRobot's dataset requirements for accepted file formats and size guidelines. See the associated considerations for important additional information.

After selecting Local file from the Add data dropdown, locate and select your dataset in the file explorer. Then, click Open.

Supported file types

NextGen supports the following file types for upload: .csv, .tsv, .dsv, .xls, .xlsx, .sas7bdat, .geojson, .gz, .bz2, .tar, .tgz, .zip.

URL

You can use a local, HTTP, HTTPS, Google Cloud Storage, Azure Blob Storage, or S3 (URL must use HTTP) URL to import your data. To use a local file, specify the URL as follows: file:///local/file/location.

After selecting URL from the Add data dropdown, enter the URL in the field and click Save.

Note

When importing a data using a URL, DataRobot registers a snapshot of the full dataset.

Calendars for time series

Calendars for time series experiments can be uploaded directly to the Data Registry using any of the upload methods. Calendars uploaded as a local file are automatically added to the Data Registry, where they can then be shared and downloaded.

Next steps

From here you can:


Updated March 5, 2025