Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Snowflake Data Warehouse Connector for Data Prep

User Persona: Data Prep User, Data Prep Admin, Data Source Admin, or IT/DevOps

Note

This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.

Configure Data Prep

The Snowflake connector allows you to use a JDBC-based connection for Library imports and exports. The following fields are used to create a connection.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Tip

You can connect Data Prep to multiple Snowflake accounts. Using a descriptive name can be a big help to users in identifying the appropriate data source. If you are a Data Prep SaaS customer, inform Data Prep DevOps how you would like this set.

DataBase URI

Database, Schema, and Table Visibility

You can control the database, schemas, and tables that are shown to users when they browse a data source during import. For each, you can choose to:

  • "Show only" which returns only the databases, schemas or tables that you specify here.
  • "Hide" which hides the databases, schemas, and tables that you specify here.
  • "Show all" which is the default setting to display everything in the data source.

When you select the "Show only" or "Hide" options, a field is provided for specifying a comma-separated list of the databases, schemas or tables on which you want the option enforced.

Note

These settings are not enforced when users query against the data source using SQL; query results still return a complete list of matches. For example, if you choose to "hide" a specific database, users can still execute queries that pull data from tables within that database. However, that database will not be displayed to users when they browse the data source.

Import Configuration

  • Query Prefetch Size for queries: The batch size used (in rows) when prefetching data on import.
  • Max Column Size: The maximum size in Unicode characters allowed in a cell when importing or exporting data.
  • Pre-import SQL: A SQL statement to execute before beginning import, after the table's schema is determined. This SQL will also be executed before data is previewed in the Data Prep UI.
  • Post-import SQL: A SQL statement to execute after import completes. This SQL will also be executed before data is previewed in the Data Prep UI.

Export Configuration

  • Export method: Select the method for exporting data from Snowflake. Both of these methods are specific to how Snowflake imports the data. For more details on these options, please refer to the Snowflake documentation linked behind the listed option. There are two options:

  • Internal stage: Write data to a file on a Snowflake internal stage before loading the data into a table. This method is recommended for larger datasets since it will be faster than Direct SQL. Stage Types:

    • Temporary: The stage created will be dropped at the end of the session in which it was created. The stage is managed by Snowflake, so no further configuration is required.
    • Permanent: Specifies the name of a Stage that has already been created in Snowflake.
      • Stage Name: specifies the name for an existing named Internal Stage in Snowflake. See the Snowflake Identifier Syntax.
  • Direct SQLT: Use SQL insert statements to export data. For larger datasets, this approach is slower than using an internal stage.

    • Export Batch Size: The batch size used when exporting data if the Direct SQL export method has been selected.
  • Max VARCHAR Size: The maximum size allowed for a VARCHAR column allowed on export. Values greater than this size will be replaced with null when data is loaded into the Snowflake table.

  • Automatically create table: Create a new table automatically on export. If enabled, Data Prep will drop the table whose name matches the name of the exported dataset, if one already exists, and recreate the table using the exported dataset. If not enabled, Data Prep will not create a new table, but will instead load the exported data into the table whose name matches the name of the exported dataset.
  • Pre-export SQL: A SQL statement to execute before beginning export, after the table is created if auto-create is enabled.
  • Post-export SQL: A SQL statement to execute after export completes.

Note

Data Prep exports using the “TIMESTAMP_LTZ(9)“ type. If a table was created using a different timestamp, exporting Data Prep data to a column with a mismatching timestamp type will result in an error. The error reads: '"There was an error while performing the export. Reason: SQL compilation error: Expression type does not match column data type, expecting TIMESTAMP_#### but got TIMESTAMP_LTZ(9) for column _Column_Name."

To correct this, do one of the following:

  • Allow Data Prep to create the table and then perform the export, or
  • Create your table with TIMESTAMP_LTZ(9) and then perform the export.

Credentials

The user authentication can be done through a Shared Account or an Individual Account. If you choose to authenticate with an Individual Account, the user will be prompted to enter a username and password to access this Data Source. If you choose to authenticate with a Shared Account, the following fields are required.

User: The username for a shared account used to authenticate with the database.

Password: The password for a shared account used to authenticate with the database.

Role: The role to set for the session using 'USE ROLE'. The specified role should be an existing role that has already been assigned to the user. Roles can also be specified using the JDBC URI, but the value specified in the Role field will supersede a role specified in the URI. Information on Snowflake Roles can be found here: https://docs.snowflake.net/manuals/sql-reference/sql/use-role.html

Data Import Information

Via Browsing

The browsing experience will vary and is based on the database, schemas, and tables visibility settings selected above along with the user credentials supplied.

Via SQL Query

As noted in the database, schemas, and tables visibility section, the limits to what a user can import via query are only limited to their authorization as determined by the credentials supplied for the connection.

Queries can be performed using legal SQL Select statement as defined here: https://docs.snowflake.net/manuals/sql-reference/sql/select.html

Example:

SELECT * FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER"


Updated October 28, 2021
Back to top