Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Data Prep Connector setup for Data Prep

What are Data Prep Connectors?

Every Data Prep story starts and ends with Connectors. Being able to do data preparation is only valuable if you can get the data you need to prep and then can send that data where you need it after it’s been prepped. Data Prep Connectors are the tools for getting data into and out of Data Prep.

Benefits of Data Prep Connectors

Straightforward data access for business users

Accessing data on disparate systems isn’t very complicated for coders—most databases, file stores, and web services have well-developed, code-friendly interfaces that adhere to industry standards.

Data integration is hard for non-coding users

DataRobot has tackled this problem and has opened up as many data sources as possible to non-coding users of DataRobot Data Prep. Our goal is that a business analyst (non-coding user) can access any data in the organization they are authorized to use.

Browsing vs. Querying

One core aspect of enabling non-coding users is the browsing interface. Where other data prep or ETL solutions rely on SQL queries, every data source in Data Prep can be browsed and data can be imported with clicks.

Control and Governance

The business environment is significantly more fluid than IT infrastructure typically accommodates, but still, certain people should only have access to certain information and should only be able to send that information to certain places. The Connector framework allows large and complex organizations to ensure users can access only the information granted to them and can be configured simply for smaller organizations where speed and self-service are a priority.

Setup of Data Prep Connectors

Three Layers of Configuration

When setting up a Connector, there are three hierarchical levels of configuration, from highest to lowest: “Connector,” “Data Source,” and “Session.” If a field is filled out at a higher level, it won’t need to be filled out again downstream. Some fields may be alterable at a later stage, but that varies greatly across the Connectors.

Connector configuration

This level is typically created and managed by an Admin or IT and it exists to:

  • Make a given Connector available to specific groups of users.
  • Allow an administrator to enter information that users won’t know and/or that will be the same across all users/data sources that rely on the Connector Config.
  • It also allows an Admin to keep sensitive information secure from users who shouldn’t have access, e.g. an SSH Key.

Data Source configuration

This level is typically created and managed by either individual users or admins, depending on how access to the source system data is being managed and it exists to:

  • Contain all persistent configuration not already captured at the Connector Config level.
  • Typically, this includes everything except for user credentials supplied at runtime for a shared Data Source Config.

Session configuration

This level is almost exclusively managed by individual users or ignored if not required and it exists to:

  • Capture information at runtime of import/export.
  • Typically, this is limited to user credentials.

Sharing controls

  • Connector & Data Source Configs can be shared with groups within your tenant.
  • These sharing controls also allow you to specify if members of the specified groups can Read, Update, or Delete the configuration and whether the users may perform imports and/or exports with the configuration.

Example Setups

The following are a few examples of business situations and how the Connector Framework can be set up to accommodate the needs of each team.

Example 1:

Business Situation

IT-managed SFTP Server authenticated by “SSH Key with Passphrase” where the key and passphrase are held by IT and several teams will need access to different directories.

Setup

  • Connector Config
  • IT will create one Connector Config and fill out SFTP Host & Port, SSH Key & Passphrase.
  • Sharing: None
  • Data Source Config
  • Create a new Data Source for each team, specify the appropriate Root Directory.
  • Sharing: Share each fully-configured Data Source as Read-only with the corresponding team and allow imports & exports if appropriate.
  • Session Config
  • N/A

Benefits of this approach

  • If the credentials change, they only need to be managed in one place.
  • IT can manage credentials and keep them private from users.
  • Each team has the access they need without having to manage access control on the data source itself.

Example 2

Business Situation

Admin managed Salesforce Org where each user should access only the information they have permissions for in Salesforce and each user will need to run automation jobs within Data Prep.

Setup

  • Connector Config
  • The Salesforce Admin will create one Connector Config and fill out all relevant information except for User & Password.
  • Sharing: Share this Config with each relevant group as Read-only.
  • Data Source Config
  • Each user should create their own Data Source config and fill out just their credentials so their setup persists and can be used in automation jobs.
  • Sharing: None
  • Session Config
  • N/A

Benefits of this approach

  • Admin level setup is completed by the admin and each user must only enter their username and password, the information they should have readily available.
  • Each user’s authorization is managed in Salesforce.

Updated April 12, 2022
Back to top