Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

SFTP Connector for Data Prep

User Persona: Data Prep User, Data Prep Admin, Data Source Admin, or IT/DevOps

Note

This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.

Configure Data Prep

This Connector allows you to connect to an SSH File Transfer Protocol (SFTP) Server for Library imports and exports. The following fields are used to define the connection parameters.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Tip

You can connect Data Prep to multiple SFTP servers. Using a descriptive name can be a big help to users in identifying the appropriate data source. If you are a Data Prep SaaS customer, inform Data Prep DevOps how you would like this set.

SFTP Host

If the SFTP Host section appears in the Add Source/Edit Source form, provide the information used to locate and connect to the SFTP host.

  • SFTP Hostname: You can use either the fully qualified hostname, including the domain name, or the IP address of the SFTP server.
  • SFTP Port: The socket port for the SFTP server. The protocol specifies port 22 as default.
  • Automatic Host Key Verification: Automatically accept the host key from the SFTP server.

  • Selected: This option enables SFTP Connector to automatically trust connections to your SFTP server. This is equivalent to setting StrictHostKeyChecking=no in SSH.

  • Deselected (default setting): This option disables automatic trust configuration to SFTP HOSTNAME. This is selected as the default option as it represents the higher security configuration.

  • Keep Alive: Enable/Disable session activity to prevent a timeout.

    • Selected (default setting): Enables periodic background communication between SFTP Connector and SFTP server to keep the connection from being closed by the server during browse, import, and export.
    • Deselected: The duration of the connection is managed by the SFTP server configuration. Idle connections may be terminated by the server. In this configuration, it is best to avoid lapsed inactivity when browsing to import/export data.
  • Data Compression: Enable data compression during transfer.

  • Selected (default setting): Enables ZLIB compression of data during transfer between the SFTP server and Data Prep, resulting in an increase in transfer speed for most datasets. In the event that ZLIB compression cannot be negotiated between Data Prep and server, the connection will fall back to uncompressed transfer automatically.

  • Deselected: Disables ZLIB compression of data during transfer between SFTP server and Data Prep

  • Socket Timeout Seconds: The number of seconds to wait for SFTP command execution (list directory, create directory, logout...). The default value is 30 seconds. To allow for longer wait, increase this value.

  • This option will most likely be used when the SFTP server directories contain very large lists of data files.

Configuration

  • Root Directory: Defines the top-level directory to be presented in Data Prep's browse interfaces for import and export. Users can see files and directories within this directory in the browsing interface.

Authentication

The SFTP connector can authenticate using password authentication or SSH keys (with or without a Passphrase). Here are the options:

  • User Credentials: This is a username and password combination.

    • USERNAME: The username for authenticating with the SFTP server
    • PASSWORD: The password associated with the provided username
  • SSH Key Without Passphrase: This option only requires that you paste in the SSH Key

    • USERNAME: The username for authenticating with the SFTP server
    • SSH PRIVATE KEY: The contents of the SSH private key associated with the username
  • SSH Key With Passphrase: Paste in the SSH Key and enter the Passphrase

    • USERNAME: The username for authenticating with the SFTP server
    • SSH PRIVATE KEY: The contents of the SSH private key associated with the username
    • PASSPHRASE: The encryption passphrase for your Private Key

Data Import & Export Information

Via Browsing

  • The Connector will present a browsable directory hierarchy starting at the location defined in the ROOT DIRECTORY field.
  • The Connector also supports Wildcard & Glob importing, this enables users to import multiple SFTP data files into Data Prep as a single Dataset.

Via SQL Query

  • As SFTP is a file store, SQL Queries are not supported for this data source.

Technical Specs

  • We test this Connector against a standard, non-configured Linux implementation of OpenSSH

FAQ/Troubleshooting/Common Issues

Note that SFTP is as much a protocol as it is a type of storage. If you have an “SFTP Server”, what you really have is a storage location that interfaces with the web using the SSH File Transfer Protocol. This is an important distinction as anything (web services, SFTP service providers, etc) can expose data to the web using this protocol. These services might be using different implementations of SFTP or they may do things behind the scenes that a traditional SFTP Server would not. All this is to say that SFTP servers may have custom behavior that presents challenges either in connecting or importing data.

Here’s one example of where this type of variance from standard SFTP caused some challenges: A customer was using the SFTP Connector to pull data from one of their vendors. The vendor was using a service that exposed data via SFTP, but would then delete each datafile after being read. When Data Prep provides a preview of data upon import, this is done by querying the data source for a small chunk of the data present. This caused the system to delete the file before it could be fully imported.


Updated October 28, 2021
Back to top