Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

ADLS Gen2

Preview

Support for ADLS Gen2 in Workbench is on by default.

Feature flag(s): Enable ADLS Gen2 Connector

Supported authentication

  • OAuth
  • Azure service principal

OAuth

Register the DataRobot application in Azure

For the Microsoft identity platform to provide OAuth 2.0 authentication and authorization services for an application and its users, the application must be registered in the Azure portal with the associated parameters configured.

Once this step is done, you will have the following information required for setup in DataRobot:

  • Client ID
  • Client secret
  • Scope
  • Properly configured end-user permissions for role-based access control

To register a DataRobot application in the Azure portal and configure its parameters, follow the instructions in the Microsoft Entra documentation:

  1. Under Supported account types, select Accounts in any organizational directory and personal Microsoft accounts or Accounts in any organizational directory.
  2. After the initial registration is complete, copy the Application ID (Client ID) on the Overview page.
  3. Configure a redirect URI. In Configure platform settings, select Web and enter a redirect URI as follows: https://<host>/account/adls/adls_oauth_authz_return (e.g., `https://app.datarobot.com/account/adls/adls_oauth_authz_return). The first part is where you installed the DataRobot application.
  4. Configure a client secret. In Certificates & secrets, select the Client secrets tab and click New client secret. Copy the client secret value (you won't be able to copy this later).

    Note

    Each client secret has an expiration date. To avoid OAuth outages, periodically create a new client secret. Once a new client secret is created, you must update all associated credentials.

  5. Configure the permissions (scope):

    1. Go to your DataRobot application in the Azure portal app registrations.
    2. In the left panel under Manage, select API Permissions > Add a permission.
    3. Select Azure Data Lake, click Delegated permissions, and select the box next to Have full access to the Data Lake service. Then, click Add permissions. Azure Data Lake is now listed under permissions.
    4. Under Azure Data Lake, click User_impersonation. Copy the first URL in the resulting panel—this is the scope.

If the user already has access to the data in the storage account, you can skip Configure access to the storage account.

Configure access to the storage account

To allow the DataRobot app to access files or objects under a storage account on behalf of the user, the user must first be granted access to the storage account files and objects. Azure role-based access control (RBAC) is recommended. See the Microsoft Azure documentation for more information.

To set up RBAC, follow the instructions in the Microsoft Azure documentation using the following parameters:

Mark the application as publisher verified

Mark the DataRobot application as publisher verified using the instructions in the Microsoft Entra documentation.

Azure service principal

Register the DataRobot application in Azure

To support the Azure service principal account, you must create and register a DataRobot application in the Azure portal, and configure its permissions.

Once this step is done, you will have the following information required for setup in DataRobot:

  • Client ID
  • Client secret
  • Tenant ID
  • Properly configured service principal permissions for role-based access control

To register a DataRobot application in the Azure portal and configure its parameters, follow the instructions in the Microsoft Entra documentation:

Note

Configuring a redirect URI is optional for service principal connections.

  1. Under Supported account types, select Accounts in this organizational directory only. Note that you will need the name of the application to assign permissions.
  2. After the initial registration is complete, copy the Application ID (Client ID) and Directory ID (Tenant ID) on the Overview page.
  3. Assign a role to the application. Set the role name to Storage Blob Data Reader. If you want to set permissions at the storage account level, select the appropriate storage account and follow the instructions.
  4. Configure a client secret. In Certificates & secrets, select the Client secrets tab and click New client secret. Copy the client secret value (you won't be able to copy this later).

    Note

    Each client secret has an expiration date. To avoid OAuth outages, periodically create a new client secret. Once a new client secret is created, you must update all associated credentials.

Set up a connection in DataRobot

To connect to ADLS Gen2 in DataRobot (this example uses service principal):

  1. Open Workbench and select a Use Case.
  2. Follow the instructions for connecting to a data source.
  3. Enter the Azure Storage Account Name, the subdomain name of your unique Azure URL.
  4. Under Authentication, click New credentials and select an authentication method. Then, enter the required parameters retrieved in the previous sections, and a unique display name. If you've previously added credentials for this data source, you can select it from your saved credentials.

  5. Click Save.

Required parameters

The table below lists the minimum required fields to establish a connection with ADLS Gen2:

Required field Description Notes
Azure storage account name A unique name for your Azure storage account, which contains all your Azure Storage data objects. Microsoft documentation
Client ID A unique value that identifies an application in the Microsoft identity platform. Microsoft documentation
Client Secret Credentials used by confidential client applications that access a web API. Microsoft documentation
Scope Permissions-based access to web API resources for authorized users and client apps that access the API. Microsoft documentation
Required field Description Notes
Azure storage account name A unique name for your Azure storage account, which contains all your Azure Storage data objects. Microsoft documentation
Client ID A unique value that identifies an application. Microsoft documentation
Client Secret Credentials used by confidential client applications that access a web API. Microsoft documentation
Azure Tenant ID A unique identifier for your Microsoft Entra tenant, which represents an organization. Microsoft documentation

Optional parameters

'File System Name' and 'Data Store Root Directory' are optional parameters. If specified, you can browse the files and folders within the specified file system or root directory directly from DataRobot.

Feature considerations

Consider the following when connecting to ADLS Gen2 in DataRobot.

  • The ADLS Gen2 connector does not support:

    • Data wrangling in Workbench
    • Batch predictions
    • Feature Discovery

Updated March 26, 2024