Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Add files from remote repos to custom models

If you add a model to the Custom Model Workshop, you can add files to that model from a wide range of repositories, including Bitbucket, GitHub, GitHub Enterprise, S3, GitLab, and GitLab Enterprise. After adding a repository to DataRobot, you can pull files from the repository and include them in the custom model.

Add a remote repository

The following steps show how to add a remote repository so that you can pull files into a custom model.

  1. Select a custom model you wish to add files to, and navigate to Assemble > Add files > Remote repository.

  2. Click add new to integrate a new remote repository with DataRobot.

    See the following topics for next steps to register the repositories:

Bitbucket Server repository

To register a Bitbucket Server repository:

  1. Select Bitbucket Server from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Complete the required fields:

    Field Description
    Name The name of the Bitbucket Server repository.
    Repository location The URL for the Bitbucket Server repository that appears in the browser address bar when accessed. Alternatively, select Clone from the Bitbucket Server UI and paste the URL.
    Personal access token The token used to grant DataRobot access to the Bitbucket Server repository. Generate this token from the Bitbucket Server UI by navigating to Profile > Manage account > Personal access tokens and selecting Create a token. Name the token, review the permissions, and once created, copy the token string to this field.
    Description Optional. A description of the Bitbucket Server repository.
  3. Click Test to verify connection to the repository.

  4. Once you have verified the connection, click Add repository. The Bitbucket Server repository can now be used to pull files for custom models.

GitHub repository

To register a public GitHub repository:

  1. Select GitHub from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Authorize the GitHub app by clicking Authorize GitHub App and agreeing to grant DataRobot read-only access to your GitHub account's public repositories.

    Note

    You can also use repositories that are part of any GitHub organization you belong to.

    Tip

    At any time you can Unauthorize the app. This revokes access from all of your registered GitHub repositories in DataRobot. All registered repositories will be preserved, but without access to your GitHub repositories. You can re-authorize the app later.

  3. Once authorized, complete the required fields:

    Field Description
    Name The name of the GitHub repository.
    Edit repository permissions To use a private repository, you need to grant the GitHub app access.
    Repository Enter the GitHub repository URL. Start typing the repository name and repositories will populate in the autocomplete dropdown. Notes:
    Description Optional. A description of the GitHub repository.
  4. Click Test to verify the repository connection.

  5. When validated, select Add repository. You can now pull files from the repository to add to a custom model.

Edit GitHub repository permissions

To use a private repository, click Edit repository permissions in the Add GitHub repository window. This gives the GitHub app access to your private repositories. You can give access to:

  • All current and future private repositories
  • A selected list of repositories

After access is granted, the private repositories appear in the autocomplete dropdown for the Repository field.

External GitHub repositories

To use an external public GitHub repository that is not owned by you or your organization, navigate to the repository in GitHub and click Code. Copy and paste the URL into the Repository field of the the Add GitHub repository window.

GitHub organization repository access

If you belong to a GitHub organization, you can request access to an organization's repository for use with DataRobot. A request for access notifies the GitHub admin, who then who approves or denies your access request.

Note

If your admin approves a single user's access request, access is provided to all DataRobot users in that user's organization without any additional configuration. For more information, reference the GitHub documentation.

GitHub Enterprise repository

To register a GitHub Enterprise repository:

  1. Select GitHub Enterprise from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Complete the required fields:

    Field Description
    Name The name of the GitHub Enterprise repository.
    Repository location The URL for the GitHub Enterprise repository that appears in the browser address bar when accessed. Alternatively, select Clone from the GitHub UI and paste the URL.
    Personal access token The token used to grant DataRobot access to the GitHub Enterprise repository. Generate this token from the GitHub UI by selecting your user icon in the top right and navigating to Settings > Developer Settings and selecting Personal access tokens. Click Generate new token. Name the token and select "repo" for the scope of access. Once created, copy the token string to this field.
    Description Optional. A description of the GitHub Enterprise repository.
  3. Click Test to verify connection to the repository.

  4. Once you have verified the connection, click Add repository. The GitHub Enterprise repository can now be used to pull files for custom models.

Git Large File Storage

Git Large File Storage (LFS) is supported by default for GitHub integrations. Reference the Git documentation to learn more. Git LFS support for GitHub always requires having the GitHub application installed on the target repository, even if it's a public repository. Any non-authorized requests to the LFS API will fail with an HTTP 403.

S3 repository

To register an S3 repository:

  1. Select S3 from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Complete the required fields. Note that AWS credentials are optional for public buckets.

    Field Description
    Name The name of the S3 repository.
    Bucket name The name of the S3 bucket. If you are adding a public S3 repository, this is the only field you must complete.
    Access key ID The key used to sign programmatic requests made to AWS. Use with the AWS Secret Access Key to authenticate requests to pull from the S3 repository. Required for private S3 repositories.
    Secret access key The key used to sign programmatic requests made to AWS. Use with the AWS Access Key ID to authenticate requests to pull from the S3 repository. Required for private S3 repositories.
    Session token Optional. A token that validates temporary security credentials when making a call to an S3 bucket.
    Description Optional. A description of the S3 repository.
  3. Click Test to verify connection to the repository.

  4. Once you have verified the connection, click Add repository. The S3 repository can now be used to pull files for custom models.

AWS S3 access configuration

DataRobot requires the AWS S3 ListBucket and GetObject permissions in order to ingest data. These permissions should be applied as an additional AWS IAM Policy for the AWS user or role the cluster uses for access. For example, to allow ingestion of data from a private bucket named examplebucket, apply the following policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:ListBucket"],
          "Resource": ["arn:aws:s3:::examplebucket"]
        },
        {
          "Effect": "Allow",
          "Action": ["s3:GetObject"],
          "Resource": ["arn:aws:s3:::examplebucket/*"]
        }
      ]
    }

Remove S3 credentials

You can remove any S3 credentials by editing the repository connection. Select the connection and click Clear Credentials.

GitLab (cloud) repository

To register a GitLab cloud repository:

  1. Select GitLab from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Authorize the DataRobot GitLab app by clicking Authorize GitLab app.

    Tip

    At any time you can Unauthorize the app. This revokes access from all of your registered GitLab repositories in DataRobot. All registered repositories will be preserved, but without access to your GitLab repositories. You can re-authorize the app later.

  3. Once authorized, complete the required fields:

    Field Description
    Name The name of the GitLab repository.
    Edit repository permissions To use a private repository, you need to grant the GitLab app access.
    Repository Enter the GitLab repository URL. Start typing the repository name and repositories will populate in the autocomplete dropdown.
    Description Optional. A description of the GitLab repository.
  4. Click Test to verify the repository connection.

  5. When validated, select Add repository. You can now pull files from the repository to add to a custom model.

GitLab Enterprise repository

To register a GitLab Enterprise repository:

  1. Select GitLab Enterprise from the list of repositories to be added in step 2 of the Add a remote repository procedure.

  2. Authorize the DataRobot GitLab app by clicking Authorize GitLab app.

    Tip

    At any time you can Unauthorize the app. This revokes access from all of your registered GitLab repositories in DataRobot. All registered repositories will be preserved, but without access to your GitLab repositories. You can re-authorize the app later.

  3. Once authorized, complete the required fields:

    Field Description
    Name The name of the GitLab repository.
    Edit repository permissions To use a private repository, you need to grant the GitLab app access.
    Repository location Enter the GitLab repository URL. Start typing the repository name and repositories will populate in the autocomplete dropdown.
    Personal access token Enter the token used to grant DataRobot access to the GitLab Enterprise repository. Generate this token from GitLab.
    Description Optional. A description of the GitLab repository.
  4. Click Test to verify the repository connection.

  5. When validated, select Add repository. You can now pull files from the repository to add to a custom model.

Create a personal access token for GitLab Enterprise

To create a personal access token:

  1. Navigate to GitLab.

  2. Enter a name for the new token, set the mandatory scopes (read_api and read_repository), and click Create personal access token.

    The newly generated token appears at the top of the page.

  3. Enter the new token into the Personal access token field in the Add GitLab Enterprise repository window.

Pull files from the repository

When you have added a repository to DataRobot, you can pull files from it to build custom models. The following example shows how to pull files from a GitHub repository.

To do so:

  1. Navigate to Assemble > Add files > Remote repository.

  2. Click Select a remote repository and choose a repository from the list.

    For a GitHub repository:

  3. Enter the tag, branch, or commit hash from which you want to pull files.

  4. Specify the path to the files being pulled.

  5. Once specified, click Pull into model. The files populate under the Model header as part of the custom model.


Updated August 2, 2022
Back to top