Google Cloud Storage Connector for Data Prep¶
User Persona: Data Prep User, Data Prep Admin, or Data Source Admin
This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.
Configure Data Prep¶
This connector allows you to connect to Google Cloud Storage (GCS) for browsing and importing objects. The following fields are used to create a connection to the data source.
Name: Name of the data source as it will appear to users in the UI.
Description: Description of the data source as it will appear to users in the UI.
You can connect Data Prep to multiple GCS accounts. Using a descriptive name can be a big help to users in identifying the appropriate data source.
Google Cloud Storage Configuration¶
Bucket Name: A Google Cloud Storage bucket represents a collection of objects in Google Cloud Storage.
Object Prefix: Prefix is a folder/sub-folder in the bucket. Select the prefix you want to use in the bucket. Default value to view all objects is "/".
JSON Web Token: JSON Web Token for Google Cloud Storage is required for authenticating the account. Provide the JWT file content for establishing a secured connection with Google Cloud Storage. For more details on the JWT, see the Google documentation for Using OAuth 2.0 for Server to Server Applications.
Web Proxy Configuration¶
If you connect to Google Cloud Storage through a proxy server, these fields define the proxy details.
Web Proxy: 'None' if no proxy is required or 'Proxied' if the connection to Google Cloud Storage should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.
Proxy host: The host name or IP address of the web proxy server.
Proxy port: The port on the proxy server for Data Source.
Proxy username: The username for the proxy server.
Proxy password: The password for the proxy server. *Leave username & password blank for an unauthenticated proxy connection.
How to Authenticate with Google¶
The Data Prep Google Cloud Storage Connector leverages Service Account authentication.
In order to access Google Cloud Storage using Data Prep you must:
Create a Google Service Account for the Cloud Storage service:
- Open the list of credentials in the Google Cloud Platform Console: https://console.cloud.google.com/apis/credentials.
- Click Create credentials.
- Select Service account key.
- In the Create service account key window, click the drop-down box below Service account, then click New service account.
- Enter a name for the service account in Name.
- Choose a Cloud Storage Role that grants the service account the desired level of access.
- Use the default Service account ID or generate a different one.
- Select the Key type: JSON.
A Service account created window is displayed and the private key for the Key type you selected is downloaded automatically. Remember the downloaded credential location. 10. Click Close. 2. Download the JSON credential for an existing Service Account for the Cloud Storage service: 1. Log in to the Google Console using the end-user account: https://console.cloud.google.com/apis/credentials.
Ensure that the correct Project is selected in the dropdown list.
Scroll down to the "OAUTH 2.0 client IDs" section.
- Click the Name of the existing ID that you plan to use in the Connector.
- On the resulting page, click the "DOWNLOAD JSON" link.
- Remember the downloaded credential location.
For additional reference, please see: https://cloud.google.com/storage/docs/authentication#generating-a-private-key
Data Import Information¶
Browse directories and files within the configured Bucket/Prefix.