DataRobot data connections¶
If your database is protected by a network policy that only allows connections from specific IP addresses, contact Support for a list of addresses that an administrator must add to your network policy (whitelist).
To enable integration with a variety of enterprise databases, DataRobot provides a “self-service” JDBC platform for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data, and avoids the unnecessary step of exporting data from your database to a CSV file for ingest into DataRobot. It allows access to more diverse data, which results in more accurate models.
The DataRobot JDBC database connectivity solution is a standardized, platform-independent solution that does not require complicated installation and configuration. Those users with the technical abilities and permissions can establish database connections; other users can leverage those connections to solve business problems.
By default, only users with "Can manage JDBC database drivers" permission can add, update, or remove JDBC drivers. Only users with "Can manage connectors" permission can add connections. See Authentication, roles, and permissions for details on permissions.
This section includes the following:
- A list of the supported databases with version and driver details
- A list of deprecated databases
- An overview of the database connectivity workflow
- Steps for creating new connections
- Steps for adding data sources
- Steps for sharing data connections
DataRobot with JDBC 4.1 has tested support for the following databases.
|Amazon Redshift||1.2.34||Amazon Redshift: Configuring a JDBC driver connection|
|AWS Athena||2.0.5||Using Athena with the JDBC Driver|
|Google BigQuery||spark-2.4.5||ODBC and JDBC drivers for BigQuery|
|InterSystems||3.0.0||InterSystems JDBC Connections|
|KDB+ *||2019.11.11||Kx Systems jdbc.jar (Log in to GitHub before clicking this link.)|
|Microsoft SQL Server 6.4||9.2.1||Microsoft JDBC Driver for SQL Server|
|Oracle 6||oracle-xe_11.2.0-1.0||Central Repository Oracle JDBC|
|Oracle 8||220.127.116.11||Central Repository Oracle JDBC|
|PostgresSQL||42.0.0 JDBC 4.2||PostgreSQL JBDC Driver|
|SAP HANA||2.4.70||Connect Using the SAP HANA JDBC Driver|
|Snowflake||3.12.10||Snowflake JDBC driver repository|
|TD-Hive||0.5.10||JDBC Driver for Hive|
|Treasure Data||Select the query engine you contracted for.||Treasure Data Integrations|
* Only supported with JDBC, not with KDB native query language
See Manage JDBC drivers for steps to upload JDBC drivers.
Support is deprecated for these drivers:
|Apache Hive for JDBC||All|
|Microsoft SQL Server||6.0|
Older driver versions may still exist but DataRobot recommends that you use the latest supported versions of the drivers.
Database connectivity workflow¶
DataRobot's database connectivity workflow, described below, has two fundamental components. First, the administrator uploads JDBC drivers and configures database connections for those drivers. Then, users can import data into DataRobot for project creation and predictions, as follows:
From the Data Connections page, create data connection configuration(s).
Once configured, your data sources are available for both ingest from the Start screen and for predictions from the Make Predictions tab.
There are additional opportunities to launch the data source creation dialogs, but these instructions describe the process used in all cases.
Create a new connection¶
To create a new data connection:
From the account menu on the top right, select Data Connections.
Click Add new data connection to open the data store selection dialog box. You can also create a new data connection using the AI Catalog by selecting Add to catalog > New Data Connection.
Select the tile for the data store you wish to use.
On premise installations
For on premise installations, you might not see any data stores listed. In that case, click Add a new driver and add a driver from the list of supported databases.
Complete the fields for the data store. They will vary slightly based on the data store selected.
Field Description Data connection name Provide a unique name for the connection. Version Select the version of the data store to use from the dropdown list. Configuration: Parameters Modify parameters for connections. Configuration: URL Enter the URL to the database/data store to connect to, in the form jdbc:mysql:// : / . You can include parameters in the URL if your connection requires them.
Click Add data connection to save the configuration.
The new connection appears in the left-panel list of Data Connections.
Any connection that you create is only available to you unless you share it with others.
Data connection with parameters¶
The parameters provided for modification in the data connection configuration screen are dependent on the selected driver. Available parameters are dependent on the configuration done by the administrator who added the driver.
Many other fields can be found in a searchable expanded field. If a desired field is not listed, you can click Add parameter to include it.
Click the trash can icon () to remove a listed parameter from the connection configuration.
Additional parameters may be required to establish a connection to your database. These parameters are not always pre-defined in DataRobot, in which case, they must be manually added.
For more information on the required parameters, see the documentation for your database.
Data connection with OAuth¶
Snowflake and Google BigQuery users can set up a data connection using OAuth single sign-on. Once configured, you can read data from production databases to use for model building and predictions.
When connecting with OAuth parameters, note the following:
- You must create a new data connection.
- You cannot associate multiple Google BigQuery credentials with a single data connection; to use a different set of credentials, you must remove the existing token (does not apply to Snowflake users).
- DataRobot does not filter Google BigQuery schemas or tables by the ProjectId specified in the data connection, so all schemas and tables associated with your BigQuery account appear in DataRobot.
Then, use the tabs below to finish setting up OAuth for the appropriate database:
After clicking Test Connection, a window appears. Click Sign in using Google.
Select the account you would like to use.
To provide consent to the database client, click Allow.
You must connect to only one BigQuery account.
After clicking Test Connection, a window appears and you must enter your Snowflake client ID and client secret, which can be retrieved from your database account. Click Save and sign in.
Enter your Snowflake username and password. Click Sign in.
To provide consent to the database client, click Allow.
If the connection is successful, the following message appears in DataRobot:
Test the connection¶
Once your data connection is created, test the connection by clicking the Test connection button in the upper right.
In the resulting dialog box, enter or use stored credentials for the database identified in the JDBC URL field or the parameter-based configuration of the data connection creation screen. Click Sign in and when the test passes successfully, click Close to return to the Data Connections page and create your data sources.
Modify a connection¶
You can modify the name, JDBC URL, and, if the driver was configured with them, the parameters of an existing data source.
Select the data connection in the left-panel connections list.
In the updated main window, click in the box of the element you want to edit and enter new text.
Click Save changes.
Delete a connection¶
You can delete any data connection that is not being used by an existing data source. If it is being used, you must first delete the dependencies. To delete a data connection:
From the Data Connections tab, select the data connection in the left-panel connections list.
Click the Delete button in the upper right ().
DataRobot prompts for confirmation. Click Delete to remove the data connection. If there are data sources dependent on the data connection, DataRobot returns a notification.
Once all dependent data sources are removed via the API, try again to delete the data connection.
Add data sources¶
Your data sources specify, via SQL query or selected table and schema data, which data to extract from the data connection. It is the extracted data that you will use for modeling and predictions. You can point to entire database tables or use a SQL query to select specific data from the database. Any data source that you create is available only to you.
Once data sources are created, they cannot be modified and can only be deleted via the API.
To add a data source, do one of the following:
From the Start screen, click Data Source and select the connection that holds the data you would like to add. See how to import from an existing data source.
Share data connections¶
Because the user creating a data connection and the enduser may not be the same, or there may be multiple endusers for the data connection, DataRobot provides the ability to set user-level permissions for each entity. You can accomplish scenarios like the following:
- A user wants to set permissions on a selected data entity to control who has consumer-level, editor-level, or owner-level access. Or, the user wants to remove a particular user's access.
- A user that has had a data connection shared with them wants the shared entity to appear under their list of available entities.
When you invite a user, user group, or organization to share a data connection, DataRobot assigns the default role of Editor to each selected target (not all entities allow sharing beyond a specific user). You can change the role from the dropdown menu.
To share data connections:
From the account menu on the top right, select Data Connections, select a data connection, and click Share:
Enter the email address, group name, or organization you are adding and select a role. Check the box to grant sharing permission.
Click Share to add the user, user group, or organization.
Add any number of collaborators and when finished, click Close to dismiss the sharing dialog box.
Depending on your own permissions, you can remove any user or change access as described in the table of roles and permissions.
There must be at least one Owner for each entity; you cannot remove yourself or remove your sharing ability if you are the only collaborating Owner.