Skip to content

100GB Ingest

Since version 11.0, DataRobot supports 100GB ingest into Data Registry from different sources. This document highlights infrastructure requirements for large data enablement.

Disk size

Large-file ingest through URL and data stages requires larger ephemeral disk volumes on nodes. For a reliable 100GB ingest workflow, increase the volume size to 600GB. While 600GB is the recommended ephemeral node volume size, the minimum required value is 250GB. With other cluster workloads, the minimum may still be insufficient and some ingests can fail when disk space runs out.

Please note that internal storage isn't used when doing ingest through JDBC drivers and native database connectors.

Minimal Disk Size: 250GB Recommended Disk Size: 600GB

Applicable for AKS, EKS, GKE, and OpenShift.

Other notes

Local file uploads don't support 100GB.