Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

DataRobot data connections

Note

If your database is protected by a network policy that only allows connections from specific IP addresses, contact Support for a list of addresses that an administrator must add to your network policy (whitelist).

To enable integration with a variety of enterprise databases, DataRobot provides a “self-service” JDBC platform for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data, and avoids the unnecessary step of exporting data from your database to a CSV file for ingest into DataRobot. It allows access to more diverse data, which results in more accurate models.

The DataRobot JDBC database connectivity solution is a standardized, platform-independent solution that does not require complicated installation and configuration. Those users with the technical abilities and permissions can establish database connections; other users can leverage those connections to solve business problems.

Note

By default, only users with "Can manage JDBC database drivers" permission can add, update, or remove JDBC drivers. Only users with "Can manage connectors" permission can add connections. See Authentication, roles, and permissions for details on permissions.

This section includes the following:

Supported databases

DataRobot with JDBC 4.1 has tested support for the following databases.

Database Version Driver Jar
Amazon Redshift 1.2.34 Amazon Redshift: Configuring a JDBC driver connection
AWS Athena 2.0.5 Using Athena with the JDBC Driver
Google BigQuery spark-2.4.5 ODBC and JDBC drivers for BigQuery
InterSystems 3.0.0 InterSystems JDBC Connections
KDB+ * 2019.11.11 Kx Systems jdbc.jar (Log in to GitHub before clicking this link.)
Microsoft SQL Server 6.4 9.2.1 Microsoft JDBC Driver for SQL Server
MySQL 8.0.23 MySQL Connectors
Oracle 6 oracle-xe_11.2.0-1.0 Central Repository Oracle JDBC
Oracle 8 12.2.0.1 Central Repository Oracle JDBC
PostgresSQL 42.0.0 JDBC 4.2 PostgreSQL JBDC Driver
Presto 0.193 Presto downloads
SAP HANA 2.4.70 Connect Using the SAP HANA JDBC Driver
Snowflake 3.12.10 Snowflake JDBC driver repository
TD-Hive 0.5.10 JDBC Driver for Hive
Treasure Data Select the query engine you contracted for. Treasure Data Integrations

* Only supported with JDBC, not with KDB native query language q

See Manage JDBC drivers for steps to upload JDBC drivers.

Deprecated databases

Support is deprecated for these drivers:

Database Version
Apache Hive for JDBC All
Apache Hive All
Elasticsearch All
Microsoft SQL Server 6.0

Note

Older driver versions may still exist but DataRobot recommends that you use the latest supported versions of the drivers.

Database connectivity workflow

By default, users can create, modify (depending on their role, and share data connections (see below for definitions of the terminology used in this section). You can also create data sources.

DataRobot's database connectivity workflow, described below, has two fundamental components. First, the administrator uploads JDBC drivers and configures database connections for those drivers. Then, users can import data into DataRobot for project creation and predictions, as follows:

  1. From the Data Connections page, create data connection configuration(s).

  2. From the same Start screen or the AI Catalog, create data sources—from the data connections—to use for modeling and predictions.

    Once configured, your data sources are available for both ingest from the Start screen and for predictions from the Make Predictions tab.

  3. Optionally, and depending on role, share data connections with others.

There are additional opportunities to launch the data source creation dialogs, but these instructions describe the process used in all cases.

Create a new connection

To create a new data connection:

  1. From the account menu on the top right, select Data Connections.

  2. Click Add new data connection to open the data store selection dialog box. You can also create a new data connection using the AI Catalog by selecting Add to catalog > New Data Connection.

  3. Select the tile for the data store you wish to use.

    On premise installations

    For on premise installations, you might not see any data stores listed. In that case, click Add a new driver and add a driver from the list of supported databases.

  4. Complete the fields for the data store. They will vary slightly based on the data store selected.

    Field Description
    Data connection name Provide a unique name for the connection.
    Version Select the version of the data store to use from the dropdown list.
    Configuration: Parameters Modify parameters for connections.
    Configuration: URL Enter the URL to the database/data store to connect to, in the form jdbc:mysql://:/. You can include parameters in the URL if your connection requires them.
  5. Click Add data connection to save the configuration.

    The new connection appears in the left-panel list of Data Connections.

Note

Any connection that you create is only available to you unless you share it with others.

Data connection with parameters

The parameters provided for modification in the data connection configuration screen are dependent on the selected driver. Available parameters are dependent on the configuration done by the administrator who added the driver.

Many other fields can be found in a searchable expanded field. If a desired field is not listed, you can click Add parameter to include it.

Click the trash can icon () to remove a listed parameter from the connection configuration.

Note

Additional parameters may be required to establish a connection to your database. These parameters are not always pre-defined in DataRobot, in which case, they must be manually added.

For more information on the required parameters, see the documentation for your database.

Data connection with OAuth

Snowflake and Google BigQuery users can set up a data connection using OAuth single sign-on. Once configured, you can read data from production databases to use for model building and predictions.

When connecting with OAuth parameters, note the following:

  • You must create a new data connection.
  • You cannot associate multiple Google BigQuery credentials with a single data connection; to use a different set of credentials, you must remove the existing token (does not apply to Snowflake users).
  • DataRobot does not filter Google BigQuery schemas or tables by the ProjectId specified in the data connection, so all schemas and tables associated with your BigQuery account appear in DataRobot.

To set up a data connection using OAuth, follow the instructions for creating a data connection and testing the connection.

Then, use the tabs below to finish setting up OAuth for the appropriate database:

  1. After clicking Test Connection, a window appears. Click Sign in using Google.

  2. Select the account you would like to use.

  3. To provide consent to the database client, click Allow.

Note

You must connect to only one BigQuery account.

  1. After clicking Test Connection, a window appears and you must enter your Snowflake client ID and client secret, which can be retrieved from your database account. Click Save and sign in.

  2. Enter your Snowflake username and password. Click Sign in.

  3. To provide consent to the database client, click Allow.

If the connection is successful, the following message appears in DataRobot:

Test the connection

Once your data connection is created, test the connection by clicking the Test connection button in the upper right.

In the resulting dialog box, enter or use stored credentials for the database identified in the JDBC URL field or the parameter-based configuration of the data connection creation screen. Click Sign in and when the test passes successfully, click Close to return to the Data Connections page and create your data sources.

Modify a connection

You can modify the name, JDBC URL, and, if the driver was configured with them, the parameters of an existing data source.

  1. Select the data connection in the left-panel connections list.

  2. In the updated main window, click in the box of the element you want to edit and enter new text.

  3. Click Save changes.

Delete a connection

You can delete any data connection that is not being used by an existing data source. If it is being used, you must first delete the dependencies. To delete a data connection:

  1. From the Data Connections tab, select the data connection in the left-panel connections list.

  2. Click the Delete button in the upper right ().

  3. DataRobot prompts for confirmation. Click Delete to remove the data connection. If there are data sources dependent on the data connection, DataRobot returns a notification.

  4. Once all dependent data sources are removed via the API, try again to delete the data connection.

Add data sources

Your data sources specify, via SQL query or selected table and schema data, which data to extract from the data connection. It is the extracted data that you will use for modeling and predictions. You can point to entire database tables or use a SQL query to select specific data from the database. Any data source that you create is available only to you.

Note

Once data sources are created, they cannot be modified and can only be deleted via the API.

To add a data source, do one of the following:

Share data connections

Because the user creating a data connection and the enduser may not be the same, or there may be multiple endusers for the data connection, DataRobot provides the ability to set user-level permissions for each entity. You can accomplish scenarios like the following:

  • A user wants to set permissions on a selected data entity to control who has consumer-level, editor-level, or owner-level access. Or, the user wants to remove a particular user's access.
  • A user that has had a data connection shared with them wants the shared entity to appear under their list of available entities.

When you invite a user, user group, or organization to share a data connection, DataRobot assigns the default role of Editor to each selected target (not all entities allow sharing beyond a specific user). You can change the role from the dropdown menu.

To share data connections:

  1. From the account menu on the top right, select Data Connections, select a data connection, and click Share:

  2. Enter the email address, group name, or organization you are adding and select a role. Check the box to grant sharing permission.

  3. Click Share to add the user, user group, or organization.

  4. Add any number of collaborators and when finished, click Close to dismiss the sharing dialog box.

Depending on your own permissions, you can remove any user or change access as described in the table of roles and permissions.

Note

There must be at least one Owner for each entity; you cannot remove yourself or remove your sharing ability if you are the only collaborating Owner.


Updated November 8, 2021
Back to top