Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Intake options

Note

For a complete list of supported intake options, see the data sources supported for batch predictions.

For intake, you can use:

If you are using a custom CSV format, any intake option dealing with CSV will adhere to that format.

Local file streaming

Local file intake does not have any special options, but requires you to upload the job's scoring data using a PUT request to the URL specified in the csvUpload link in the job data. This starts the job (or queues it for processing if the prediction instance is already occupied).

If there is no other queued job for the selected prediction instance, scoring will start while you are still uploading.

Refer to this sample use case.

Note

If you forget to send scoring data, the job remains in the INITIALIZING state.

S3 scoring

For larger files, S3 is the preferred method for intake. DataRobot can ingest files from both public and private buckets. To score from S3 you must setup a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.

Parameter Example Description
type s3 DataRobot recommends S3 for intake.
url s3://bucket-name/datasets/scoring.csv An absolute URL for the file to be scored.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this URL are required, otherwise optional. Refer to storing credentials securely.

AWS credentials are encrypted and only decrypted when used to setup the client for communication with AWS during scoring.

Note

If running a Private AI Cloud within AWS, it is possible to provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS documentation.

Azure Blob Storage scoring

Another scoring option for large files is Azure. To score from Azure Blob Storage, you must configure credentials with DataRobot using an Azure Connection String.

Parameter Example Description
type azure Use Azure Blob Storage for intake.
url https://myaccount.blob.core.windows.net/datasets/scoring.csv An absolute URL for the file to be scored.
credentialId 5e4bc5555e6e763beb488dba This parameter is required if explicit access credentials for this URL are necessary (optional otherwise). l Refer to information about storing credentials securely.

Azure credentials are encrypted and are only decrypted when used to set up the client for communication with Azure during scoring.

Google Cloud Storage scoring

DataRobot also supports the Google Cloud Storage adapter. To score from Google Cloud Storage, you must setup a credential with DataRobot consisting of an JSON-formatted account key.

Parameter Example Description
type gcp Use Google Cloud Storage for intake.
url gcs://bucket-name/datasets/scoring.csv An absolute URL for the file to be scored.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this URL are required, otherwise optional. Refer to storing credentials securely.

GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.

HTTP scoring

In addition to the cloud storage adapters, you can also point batch predictions to a regular URL and DataRobot streams the data for scoring:

Parameter Example Description
type http Use HTTP for intake.
url https://example.com/datasets/scoring.csv An absolute URL for the file to be scored.

The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv.

The http adapter can be used for ingesting data from pre-signed URLs from either S3, Azure, or GCP.

AI Catalog dataset scoring

To read input data from an AI Catalog dataset, the following options are available:

Parameter Example Description
datasetId 5e4bc5b35e6e763beb9db14a The AI Catalog dataset ID.
datasetVersionId 5e4bc5555e6e763beb488dba Optional. The AI Catalog dataset version ID.

If datasetVersionId is not specified, it will default to the latest version for the specified dataset.

Note

For the specified AI Catalog dataset, the version to be scored must have have been successfully ingested and it must be a snapshot.

JDBC scoring

DataRobot supports reading from any JDBC-compatible database for Batch Predictions by specifying jdbc as the intake type. Since no file is needed for a PUT request, scoring will start immediately, transitioning the job to RUNNING if preliminary validation went well.

To support this, the Batch Prediction API integrates with external data sources and using credentials securely stored in data credentials.

Supply data source details in the intakeSettings as follows:

Parameter Example Description
type jdbc Use a JDBC data store as output.
dataStoreId 5e4bc5b35e6e763beb9db14a The ID of an external data source.
credentialId 5e4bc5555e6e763beb9db147 The ID of a stored credential containing username and password. Refer to storing credentials securely.
fetchSize (deprecated) 1000 Deprecated: fetchSize is now being inferred dynamically for optimal throughput and is no longer needed. Optional. To balance throughput and memory usage, sets a custom fetchSize (number of rows read at a time). Must be in range [1, 100000]; default 1000.
table scoring_data Optional. The name of the database table containing data to be scored.
schema public Optional. The name of the schema containing the table to be scored.
query SELECT feature1, feature2, feature3 AS readmitted FROM diabetes Optional. A custom query to run against the database.

Note

You must specify either table and schema or query.

Refer to the example section for a complete example.

Source IP addresses for whitelisting

Any connection initiated from DataRobot originates from one of the following IP addresses:

Host: https://app.datarobot.com Host: https://app.eu.datarobot.com
100.26.66.209 18.200.151.211
54.204.171.181 18.200.151.56
54.145.89.18 18.200.151.43
54.147.212.247 54.78.199.18
18.235.157.68 54.78.189.139
3.211.11.187 54.78.199.173
3.214.131.132
3.89.169.252

These are reserved for DataRobot use only.

Snowflake scoring

Using JDBC to transfer data can be costly in terms of IOPS (input/output operations per second) and expense for data warehouses. This adapter reduces the load on database engines during prediction scoring by using cloud storage and bulk insert to create a hybrid JDBC-cloud storage solution.

Supply data source details in the intakeSettings as follows:

Parameter Example Description
type snowflake Adapter type.
dataStoreId 5e4bc5b35e6e763beb9db14a ID of Snowflake data source.
externalStage my_s3_stage Name of the Snowflake external stage.
table SCORING_DATA Optional. Name of the Snowflake table containing data to be scored.
schema PUBLIC Optional. Name of the schema containing the table to be scored.
query SELECT feature1, feature2, feature3 FROM diabetes Optional. Custom query to run against the database.
credentialId 5e4bc5555e6e763beb9db147 ID of a stored credential containing username and password for Snowflake.
cloudStorageType s3 Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3/azure/gcp. Default is s3
cloudStorageCredentialId 6e4bc5541e6e763beb9db15c ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage.

Refer to the example section for a complete example.

Synapse scoring

To use Synapse for scoring, supply data source details in the intakeSettings as follows:

Parameter Example Description
type synapse Adapter type.
dataStoreId 5e4bc5b35e6e763beb9db14a ID of Synapse data source.
externalDatasource my_data_source Name of the Synapse external data source.
table SCORING_DATA Optional. Name of the Synapse table containing data to be scored.
schema dbo Optional. Name of the schema containing the table to be scored.
query SELECT feature1, feature2, feature3 FROM diabetes Optional. Custom query to run against the database.
credentialId 5e4bc5555e6e763beb9db147 ID of a stored credential containing username and password for Synapse.
cloudStorageCredentialId 6e4bc5541e6e763beb9db15c ID of a stored credentials for Azure Blob storage.

Refer to the example section for a complete example.

Note

Synapse supports fewer collations than the default Microsoft SQL Server. For more information, reference the Synapse documentation.

BigQuery scoring

To use BigQuery for scoring, supply data source details in the intakeSettings as follows:

Parameter Example Description
type bigquery Use the BigQuery API to unload data to Google Cloud Storage and use it as intake.
dataset my_dataset The BigQuery dataset to use.
table my_table The BigQuery table or view from the dataset used as intake.
bucket my-bucket-in-gcs Bucket where data should be exported.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are required (otherwise optional). Refer to storing credentials securely.

Refer to the this sample use case for a complete example.


Updated November 30, 2021
Back to top