Data Prep Connector setup for Data Prep¶
What are Data Prep Connectors?¶
Every Data Prep story starts and ends with Connectors. Being able to do data preparation is only valuable if you can get the data you need to prep and then can send that data where you need it after it’s been prepped. Data Prep Connectors are the tools for getting data into and out of Data Prep.
Benefits of Data Prep Connectors¶
Straightforward data access for business users¶
Accessing data on disparate systems isn’t very complicated for coders—most databases, file stores, and web services have well-developed, code-friendly interfaces that adhere to industry standards.
Data integration is hard for non-coding users¶
DataRobot has tackled this problem and has opened up as many data sources as possible to non-coding users of DataRobot Data Prep. Our goal is that a business analyst (non-coding user) can access any data in the organization they are authorized to use.
Browsing vs. Querying¶
One core aspect of enabling non-coding users is the browsing interface. Where other data prep or ETL solutions rely on SQL queries, every data source in Data Prep can be browsed and data can be imported with clicks.
Control and Governance¶
The business environment is significantly more fluid than IT infrastructure typically accommodates, but still, certain people should only have access to certain information and should only be able to send that information to certain places. The Connector framework allows large and complex organizations to ensure users can access only the information granted to them and can be configured simply for smaller organizations where speed and self-service are a priority.
Setup of Data Prep Connectors¶
Three Layers of Configuration¶
When setting up a Connector, there are three hierarchical levels of configuration, from highest to lowest: “Connector,” “Data Source,” and “Session.” If a field is filled out at a higher level, it won’t need to be filled out again downstream. Some fields may be alterable at a later stage, but that varies greatly across the Connectors.
Connector configuration¶
This level is typically created and managed by an Admin or IT and it exists to:
- Make a given Connector available to specific groups of users.
- Allow an administrator to enter information that users won’t know and/or that will be the same across all users/data sources that rely on the Connector Config.
- It also allows an Admin to keep sensitive information secure from users who shouldn’t have access, e.g. an SSH Key.
Data Source configuration¶
This level is typically created and managed by either individual users or admins, depending on how access to the source system data is being managed and it exists to:
- Contain all persistent configuration not already captured at the Connector Config level.
- Typically, this includes everything except for user credentials supplied at runtime for a shared Data Source Config.
Session configuration¶
This level is almost exclusively managed by individual users or ignored if not required and it exists to:
- Capture information at runtime of import/export.
- Typically, this is limited to user credentials.
Sharing controls¶
- Connector & Data Source Configs can be shared with groups within your tenant.
- These sharing controls also allow you to specify if members of the specified groups can Read, Update, or Delete the configuration and whether the users may perform imports and/or exports with the configuration.
Example Setups¶
The following are a few examples of business situations and how the Connector Framework can be set up to accommodate the needs of each team.
Example 1:¶
Business Situation¶
IT-managed SFTP Server authenticated by “SSH Key with Passphrase” where the key and passphrase are held by IT and several teams will need access to different directories.
Setup¶
- Connector Config
- IT will create one Connector Config and fill out SFTP Host & Port, SSH Key & Passphrase.
- Sharing: None
- Data Source Config
- Create a new Data Source for each team, specify the appropriate Root Directory.
- Sharing: Share each fully-configured Data Source as Read-only with the corresponding team and allow imports & exports if appropriate.
- Session Config
- N/A
Benefits of this approach¶
- If the credentials change, they only need to be managed in one place.
- IT can manage credentials and keep them private from users.
- Each team has the access they need without having to manage access control on the data source itself.
Example 2¶
Business Situation¶
Admin managed Salesforce Org where each user should access only the information they have permissions for in Salesforce and each user will need to run automation jobs within Data Prep.
Setup¶
- Connector Config
- The Salesforce Admin will create one Connector Config and fill out all relevant information except for User & Password.
- Sharing: Share this Config with each relevant group as Read-only.
- Data Source Config
- Each user should create their own Data Source config and fill out just their credentials so their setup persists and can be used in automation jobs.
- Sharing: None
- Session Config
- N/A
Benefits of this approach¶
- Admin level setup is completed by the admin and each user must only enter their username and password, the information they should have readily available.
- Each user’s authorization is managed in Salesforce.