Skip to content

100Gb Ingest

Since 11.0 DataRobot supports 100 GB ingest into Data Registry from different sources. This document highlights infrastructure requirements for large data enablement.

Disk size

Large files ingest which is happening through URL and data stages requires larger disk volume sizes. In order to provide smooth workflow for 100Gb ingest it's recommended to bump up the volume size to 600Gb. While 600 GB is recommended ephemeral node volume size the minimal required value is 250 GB. Although with other possible work on the cluster this value might not give a robust workflow as some ingests can fail due to running out of disk space.

Please note that internal storage isn't used when doing ingest through JDBC drivers and native database connectors.

Minimal Disk Size - 250 GB Recommended Disk Size - 600 GB

Applicable for AKS, EKS, GKE, and OpenShift.

Other notes

Local file uploads don't support 100Gb.