Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Amazon Athena Connector for Data Prep

User Persona: Data Prep User, Data Prep Admin, Data Source Admin, or IT/DevOps

Note

This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.

Configure Data Prep

This connector allows you to connect to AWS Athena as an import source. The fields you are required to set up on the data source depend on how the connector was configured by your administrator.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Tip

You can connect Data Prep to multiple AWS Athena instances. Using a descriptive name can be a big help to users in identifying the appropriate data source.

Amazon Athena Configuration

  • Athena Region: The hosting region for AWS.
  • Access Key: AWS account access key.
  • Secret Key: AWS account secret key.

Query Results Storage Configuration

  • S3 Bucket Name: The name of the S3 bucket in which Athena will store query results.
  • S3 Object Prefix: Prefix under which Athena will store query results within the specified S3 bucket. See How do I use folders in an S3 Bucket for more information on prefixes.
  • Encryption Type: AWS server-side encryption type.

About query results

When using Athena, each query result is stored in the configured S3 bucket. This is how Athena is designed to function and is expected behavior. When using Athena to import to Data Prep, your query results will be cleaned up by default when the connection closes. The Athena Connector is designed to perform this clean-up task so that you only have one instance of the query result, not two. Should you want the query results from your import to Data Prep to remain available in S3, simply run the query in Athena standalone and import the resulting file to Data Prep from S3.

Web Proxy Configuration

If you connect to AWS Athena through a proxy server, these fields define the proxy details.

  • Web Proxy: 'None' if no proxy is required or 'Proxied' if the connection to AWS Athena should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.
  • Proxy host: The host name or IP address of the web proxy server.
  • Proxy port: The port on the proxy server for Data Source.
  • Proxy username: The username for the proxy server.
  • Proxy password: The password for the proxy server.

Leave username and password blank for an unauthenticated proxy connection.

Data Import Information

Via Browsing

Browsing is supported for this Connector and uses Athena queries to generate the browseable hierarchy. Please see the note below about Athena’s cost structure.

Via SQL Query

Access the SQL reference for details.

Best Practices

When using Athena, you are charged for each query that you run. The amount that you are charged is based on the amount of data scanned by the query. For more information, see Amazon Athena Pricing.


Updated October 28, 2021
Back to top