Data connections¶
Note
If your database is protected by a network policy that only allows connections from specific IP addresses, have an administrator add all allowed IPs for DataRobot to your network policy. If the problem persists, contact your DataRobot representative.
The DataRobot connectivity platform allows users to integrate with their data stores using either the DataRobot provided connectors or uploading the JDBC driver provided by the data store.
The "self-service" database connectivity solution is a standardized, platform-independent solution that does not require complicated installation and configuration. Once configured, you can read data from production databases for model building and predictions. Connectivity to your data source allows you to quickly train and retrain models on that data, and avoids the unnecessary step of exporting data from your database to a CSV file for ingest into DataRobot. It allows access to more diverse data, which results in more accurate models.
Users with the technical abilities and permissions can configure and establish data connections. Other users in the org can then leverage those connections to solve business problems.
Note
By default, only users with "Can manage JDBC database drivers" permission can add, update, or remove JDBC drivers. See Roles and permissions for details on permissions.
This page describes the following workflows:
- An overview of the database connectivity workflow.
- Steps for creating new connections.
- Steps for adding data sources.
- Steps for sharing data connections.
Database connectivity workflow¶
By default, users can create, modify (depending on their role), and share data connections. You can also create data sources.
DataRobot's database connectivity workflow, described below, has two fundamental components. First, the administrator uploads JDBC drivers and configures database connections for those drivers. Then, users can import data into DataRobot for project creation and predictions, as follows:
-
From the Data Connections page, create data connection configuration(s).
-
From the same Start screen or the AI Catalog, create data sources—from the data connections—to use for modeling and predictions.
Once configured, your data sources are available for both ingest from the Start screen and for predictions from the Make Predictions tab.
-
(Optional) Depending on role, share data connections with others.
There are additional opportunities to launch the data source creation dialogs, but these instructions describe the process used in all cases.
Allowed source IP addresses¶
Any connection initiated from DataRobot originates from one of the following IP addresses:
Host: https://app.datarobot.com | Host: https://app.eu.datarobot.com | Host: https://app.jp.datarobot.com |
---|---|---|
100.26.66.209 | 18.200.151.211 | 52.199.145.51 |
54.204.171.181 | 18.200.151.56 | 52.198.240.166 |
54.145.89.18 | 18.200.151.43 | 52.197.6.249 |
54.147.212.247 | 54.78.199.18 | |
18.235.157.68 | 54.78.189.139 | |
3.211.11.187 | 54.78.199.173 | |
52.1.228.155 | 18.200.127.104 | |
3.224.51.250 | 34.247.41.18 | |
44.208.234.185 | 99.80.243.135 | |
3.214.131.132 | 63.34.68.62 | |
3.89.169.252 | 34.246.241.45 | |
3.220.7.239 | 52.48.20.136 | |
52.44.188.255 | ||
3.217.246.191 |
Note
These IP addresses are reserved for DataRobot use only.
Create a new connection¶
To create a new data connection:
-
From the account menu on the top right, select Data connections.
You can also create a new data connection using the AI Catalog by selecting Add to catalog > New Data Connection.
All existing connections are displayed on the left. If you select a configured connection, its configuration options are displayed in the center.
-
To add a new data connection, click Add new connection.
-
Select the tile for the data store you want to use.
Self-Managed AI Platform installations
For Self-Managed AI Platform installations, you might not see any data stores listed. In that case, click Add a new driver and add a driver from the list of supported connections.
-
Name the data connection (1), select an authentication method (2), and fill in the required fields (see the documentation for your specific data store).
Note that the visible configuration options are the required parameters for the selected data store; therefore, these options vary for each data store. You can add more parameters under Show advanced options (3).
Saved credentials
If you previously added credentials for your datastore via the Credentials Management page, you can click Select saved credentials and choose them from the list instead of adding them manually.
-
Click Add from connection; once selected, the Schema tab opens.
-
The Schema tab lists the available schemas for your database—select a schema from the list. Once selected, the Tables tab opens.
To use a SQL query to select specific elements of the named database, click the SQL query tab.
-
Select the table(s) you want to register in the AI Catalog and click Proceed to confirmation. Each table will be registered as a separate catalog asset.
-
Under Settings, select the appropriate policy (1) and data upload amount (2), then review and confirm your selections by clicking Register in the AI Catalog.
Note
Any connection that you create is only available to you unless you share it with others.
Data connection with parameters¶
The parameters provided for modification in the data connection configuration screen are dependent on the selected driver. Available parameters are dependent on the configuration done by the administrator who added the driver.
Many other fields can be found in a searchable expanded field. If a desired field is not listed, open Show advanced options and click Add parameter to include it.
Click the trash can icon () to remove a listed parameter from the connection configuration.
Note
Additional parameters may be required to establish a connection to your database. These parameters are not always pre-defined in DataRobot, in which case, they must be manually added.
For more information on the required parameters, see the documentation for your database.
Test the connection¶
Once your data connection is created, test the connection by clicking Test connection.
In the resulting dialog box, enter or use stored credentials for the database identified in the JDBC URL field or the parameter-based configuration of the data connection creation screen. Click Sign in and when the test passes successfully, click Close to return to the Data Connections page and create your data sources.
Snowflake and Google BigQuery users can set up a data connection using OAuth single sign-on. Once configured, you can read data from production databases to use for model building and predictions.
For information on setting up a data connection with OAuth, the required parameters, and troubleshooting steps, see the documentation for your database: Snowflake or BigQuery.
Modify a connection¶
You can modify the name, JDBC URL, and, if the driver was configured with them, the parameters of an existing data source.
-
Select the data connection in the left-panel connections list.
-
In the updated main window, click in the box of the element you want to edit and enter new text.
-
Click Save changes.
Delete a connection¶
You can delete any data connection that is not being used by an existing data source. If it is being used, you must first delete the dependencies. To delete a data connection:
-
From the Data Connections tab, select the data connection in the left-panel connections list.
-
Click the Delete button in the upper right ().
-
DataRobot prompts for confirmation. Click Delete to remove the data connection. If there are data sources dependent on the data connection, DataRobot returns a notification.
-
Once all dependent data sources are removed via the API, try again to delete the data connection.
Add data sources¶
Your data sources specify, via SQL query or selected table and schema data, which data to extract from the data connection. It is the extracted data that you will use for modeling and predictions. You can point to entire database tables or use a SQL query to select specific data from the database. Any data source that you create is available only to you.
Note
Once data sources are created, they cannot be modified and can only be deleted via the API.
To add a data source, do one of the following:
-
From the Start screen, click Data Source and select the connection that holds the data you would like to add. See how to import from an existing data source.
-
From the AI Catalog, select Add to catalog > Existing Data Connection. See how to add data from external connections.
Share data connections¶
Because the user creating a data connection and the end-user may not be the same, or there may be multiple end-users for the data connection, DataRobot provides the ability to set user-level permissions for each entity. You can accomplish scenarios like the following:
- A user wants to set permissions on a selected data entity to control who has consumer-level, editor-level, or owner-level access. Or, the user wants to remove a particular user's access.
- A user that has had a data connection shared with them wants the shared entity to appear under their list of available entities.
When you invite a user, user group, or organization to share a data connection, DataRobot assigns the default role of Editor to each selected target (not all entities allow sharing beyond a specific user). You can change the role from the dropdown menu.
To share data connections:
-
From the account menu on the top right, select Data Connections, select a data connection, and click Share:
-
Enter the email address, group name, or organization you are adding and select a role. Check the box to grant sharing permission.
-
Click Share to add the user, user group, or organization.
-
Add any number of collaborators and when finished, click Close to dismiss the sharing dialog box.
Depending on your own permissions, you can remove any user or change access as described in the table of roles and permissions.
Note
There must be at least one Owner for each entity; you cannot remove yourself or remove your sharing ability if you are the only collaborating Owner.