This section shows how to ingest data from an Amazon Web Services S3 bucket into the DataRobot AI Catalog so that you can use it for ML modeling.
To build an ML model based on an object saved in an S3 bucket:
Navigate to the dataset object in AWS S3 and copy the object’s URL.
Select the AI Catalog tab in DataRobot.
Click Add to catalog and select URL.
In the Add from URL window, paste the URL of the object and click Save.
DataRobot automatically reads the data and infers data types and the schema of the data, as it does when you upload a CSV file from your local machine.
Now that your data has been successfully uploaded, click Create project in the upper right corner to start an ML project.
You can also ingest data into DataRobot from private S3 buckets. For example, you can create a temporary link from a pre-signed S3 URL that DataRobot can then use to retrieve the file.
The URL produced in this example allows whoever has it to read the private file, file.csv, from the private bucket, bucket-name. The expires-in parameter makes the signed link available for 600 seconds upon creation.
If you have your own DataRobot installation, you can also:
Provide the application's DataRobot service account with IAM privileges to read private S3 buckets. DataRobot can then ingest from any S3 location that it has privileges to access.
Implement S3 impersonation of the user logging in to DataRobot to limit access to S3 data. This requires LDAP for authentication, with authorized roles for the user specified within LDAP attributes.