Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Add files with remote repositories

If you add a custom model to the Workshop, you can use a GitHub, S3, GitHub Enterprise, or Bitbucket repository to pull files and include them in the model. You can also use repositories that are part of a GitHub Organization you belong to.

Select a custom model you wish to add files to, and navigate to Assemble > Add files > Remote repository.

Click add new in the modal to integrate a new remote repository with DataRobot.

GitHub repository

To register a public GitHub repository, you must authorize the GitHub app. To proceed, click Authorize GitHub App and agree to grant DataRobot read-only access to your GitHub account's public repositories. You can also use repositories that are part of any GitHub organization you belong to. Note that at any time you can Unauthorize the app. This revokes access from all of your registered GitHub repositories in DataRobot. All registered repositories will be preserved, but without access to your GitHub repositories. You can re-authorize the app later.

Once authorized, complete the following fields:

  • Name the repository.

  • Enter the GitHub repository URL. Start typing the repository name and repositories will populate in the autocomplete dropdown.

    To use an external public GitHub repository that is not owned by you or your organization, navigate to GitHub and click Code. Copy and paste the repository URL from GitHub.

    To use a private repository, click Edit repository permissions to give the GitHub App access your private repositories. You can give access to:

    • all current and future private repositories
    • a selected list of repositories.

    Once access is granted, the private repositories appear in the autocomplete dropdown for the Repository field.

  • Optionally, provide a description of the repository.

Once fully configured, click Test to verify the repository connection. When validated, select Add repository. You can now pull files from the repository to add to a custom model.

GitHub organization repository

If you belong to a GitHub organization, you can request access to an organization's repository for use with DataRobot. A request for access notifies the GitHub admin who approves or denies your access request.

Note

If your admin approves a single user's access request, access is provided to all DataRobot users in the organization without any additional configuration. For more information, reference the GitHub documentation. (Log in to GitHub before clicking this link.)

Git Large File Storage

Git Large File Storage (LFS) is supported by default for GitHub integrations. Reference the Git documentation to learn more. Git LFS support for GitHub always requires having GitHub application installed on the target repository, even if it's a public repository. Any non-authorized requests to LFS API will fail with the HTTP 403.

S3 repository

To register an S3 repository, complete the required fields:

Field Description
Name The name of the S3 repository.
Bucket Name The name of the S3 bucket. If you are adding a public S3 repository, this is the only field you must complete.
AWS Access Key ID The key used to sign programmatic requests made to AWS. Use with the AWS Secret Access Key to authenticate requests to pull from the S3 repository. Required for private S3 repositories.
AWS Secret Access Key The key used to sign programmatic requests made to AWS. Use with the AWS Access Key ID to authenticate requests to pull from the S3 repository. Required for private S3 repositories.
AWS Session Token (optional) A token that validates temporary security credentials when making a call to an S3 bucket.
Description (optional) A description of the S3 repository.

After completing the fields, click Test to verify connection to the repository. Once you have verified the connection, click Add repository. The S3 repository can now be used to pull files for custom models.

GitHub Enterprise repository

To register a GitHub Enterprise repository, complete the required fields:

Field Description
Name The name of the GitHub Enterprise repository.
Repository location The URL for the GitHub Enterprise repository that appears in the browser address bar when accessed. Alternatively, select Clone from the GitHub UI and paste the URL.
Personal access token The token used to grant DataRobot access to the GitHub Enterprise repository. Generate this token from the GitHub UI by selecting your user icon in the top right and navigating to Settings > Developer Settings and selecting Personal access token. Name the token and select "repo" for the scope of access. Once created, copy the token string to this field.
Description Optional. A description of the GitHub Enterprise repository.

After completing the fields, click Test to verify connection to the repository. Once you have verified the connection, click Add repository. The GitHub Enterprise repository can now be used to pull files for custom models.

Bitbucket Server repository

To register a Bitbucket Server repository, complete the required fields:

Field Description
Name The name of the Bitbucket Server repository.
Repository location The URL for the Bitbucket Server repository that appears in the browser address bar when accessed. Alternatively, select Clone from the Bitbucket Server UI and paste the URL.
Personal access token The token used to grant DataRobot access to the Bitbucket Server repository. Generate this token from the Bitbucket Server UI by navigating to Profile > Manage account > Personal access tokens and selecting Create a token. Name the token, review the permissions, and once created copy the token string to this field.
Description Optional. A description of the Bitbucket Server repository.

After completing the fields, click Test to verify connection to the repository. Once you have verified the connection, click Add repository. The Bitbucket Server repository can now be used to pull files for custom models.

AWS S3 access configuration

DataRobot requires the AWS S3 ListBucket and GetObject permissions in order to ingest data. These permissions should be applied as an additional AWS IAM Policy for the AWS user or role the cluster uses for access. For example, to allow ingestion of data from a private bucket named examplebucket, the following policy could be applied:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:ListBucket"],
          "Resource": ["arn:aws:s3:::examplebucket"]
        },
        {
          "Effect": "Allow",
          "Action": ["s3:GetObject"],
          "Resource": ["arn:aws:s3:::examplebucket/*"]
        }
      ]
    }

Remove S3 credentials

You can remove any S3 credentials by editing the repository connection. Select the connection and click Clear Credentials.

Pull files from a repository

When you have added a repository to DataRobot, you can pull files from it to build custom models. To do so, click Select a remote repository and choose a repository from the list.

For a GitHub repository:

  1. Enter the tag, branch, or commit hash from which you want pull files.

  2. Specify the path to the files being pulled.

  3. Once specified, click Pull into model. The files populate under the Model header as part of the custom model.

For an S3 repository:

After selecting the S3 repository, specify the path to the file you want to pull and select Pull into model. The files populate under the Model header as part of the custom model.


Updated November 5, 2021
Back to top