Data connections (video)¶
Ingesting data is the critical first step in building predictive models. This tutorial series covers the process of setting up data connections to popular data sources using both the UI and the API. Among other important topics, they demonstrate exactly where, inside your own cloud account, you can find the JDBC URLs and credentials needed to set up the connections.
The videos show how to ingest data directly into DataRobot after creating a one-time data connection—how to create, test, and use the data connection to pull data from the following Cloud providers:
Ingest from AWS S3¶
For organizations using the AWS ecosystem, it is common to use S3 as a data source because of the advantages that come with Cloud object stores. Depending on the method, URL-based or programmatic access, you supply S3 URLs or access credentials, as demonstrated in the video.
Read more¶
- End-to-end ML workflow with AWS (AI Accelerator)
- Importing from AWS S3 (DataRobot Classic)
Ingest from Azure Cloud¶
Organizations use RDBMS systems like Azure SQL to store and retrieve structured data. This video shows how to use JDBC connectivity to connect to Azure SQL and pull data into DataRobot, either via JDBC URL or the connection parameters to the Azure SQL instance.
Read more¶
- End-to-end modeling workflow with Azure (AI Accelerator)
Ingest from Databricks¶
Organizations using Databricks for data processing can easily use DataRobot for machine learning activities on Databricks data. To ingest data from Databricks, you need the JDBC URL to the Databricks cluster. The video also shows how you can adjust data into DataRobot using Databricks notebooks.
Read more¶
- End-to-end modeling workflow with Databricks (AI Accelerator)
Ingest from GCP BigQuery¶
For organizations using the GCP Ecosystem, DataRobot provides easy access to data residing in BigQuery. To allow DataRobot to authenticate with Google Cloud Platform and connect to BigQuery tables, you first need a service account in GCP and the credentials JSON file that comes with it.
Read more¶
- End-to-end ML workflow with Google Cloud Platform and BigQuery (AI Accelerator)
- Data connections (Workbench)
Ingest from SAP HANA¶
If you are using in-memory databases like SAP HANA, DataRobot provides quick and easy access to HANA datasets for machine learning. To ingest data from SAP HANA, you will use JDBC connectivity and need an SAP HANA endpoint.
Read more¶
- End-to-end modeling workflow with SAP HANA (AI Accelerator)
Ingest from Snowflake¶
Using data connections available in DataRobot, you can ingest data from Snowflake directly into DataRobot without the need for any external tools. This helps in reducing the turnaround times in machine learning projects. This video demonstrates using JDBC-based connectivity.
Read more¶
- The End-to-end modeling workflow with Snowflake (AI Accelerator)
- Snowflake data ingest and project creation
- Data connections (Workbench)