Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Output options

Note

For a complete list of supported output options, see the data sources supported for batch predictions.

For output, you can use:

If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.

Local file streaming

If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING state. The URL needed to make the request will be available as download in the links section of the job data:

{'elapsedTimeSec': 97,
 'failedRows': 0,
 'jobIntakeSize': 1150602342,
 'jobOutputSize': 107791140,
 'jobSpec': {'deploymentId': '5dc1a6a9865d6c004dd881ef',
              'maxExplanations': 0,
              'numConcurrent': 4,
              'passthroughColumns': null,
              'passthroughColumnsSet': null,
              'predictionWarningEnabled': null,
              'thresholdHigh': null,
              'thresholdLow': null},
 'links': {'download': 'https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/',
            'self': 'https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/'},
 'logs': ['Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000',
          'Job started processing at 2019-11-07 18:11:49.781000',
          'Job done processing at 2019-11-07 18:13:14.533000'],
 'percentageCompleted': 0.0,
 'scoredRows': 3000000,
 'status': 'COMPLETED',
 'statusDetails': 'Job done processing at 2019-11-07 18:13:14.533000'}

If you are downloading faster than DataRobot can ingest and score your data, the download may appear sluggish. This is because DataRobot streams the scored data as soon as it arrives (in chunks).

Refer to the this sample use case for a complete example.

S3 write

DataRobot can save scored data to both public and private buckets. To write to S3 you must setup a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.

Parameter Example Description
type s3 Use S3 for intake.
url s3://bucket-name/results/scored.csv An absolute URL for the file to be written.
credentialId 5e4bc5555e6e763beb9db147 Required if explicit access credentials for this URL are required, otherwise optional. Refer to storing credentials securely.

AWS credentials are encrypted and only decrypted when used to setup the client for communication with AWS during scoring.

Note

If running a Private AI Cloud within AWS, it is possible to provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. Read more.

Azure Blob Storage write

Another option for scoring large files is Azure. To save a dataset to Azure Blob Storage, you must setup a credential with DataRobot consisting of an Azure Connection String.

Parameter Example Description
type azure Use Azure Blob Storage for intake.
url https://myaccount.blob.core.windows.net/datasets/scored.csv An absolute URL for the file to be written.
credentialId 5e4bc5555e6e763beb488dba This parameter is required if explicit access credentials for this URL are necessary (optional otherwise). l Refer to information about storing credentials securely.

Azure credentials are encrypted and only decrypted when used to setup the client for communication with Azure during scoring.

Google Cloud Storage write

DataRobot also supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage you must setup a credential with DataRobot consisting of an JSON-formatted account key.

Parameter Example Description
type gcp Use Google Cloud Storage for output.
url gcs://bucket-name/datasets/scored.csv An absolute URL designating where the file is written.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this URL are necessary (otherwise optional). Refer to storing credentials securely.

GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.

HTTP write

In addition to the cloud storage adapters, you can \point Batch Predictions at a regular URL and DataRobot streams the data for scoring:

Parameter Example Description
type http Use HTTP for output.
url https://example.com/datasets/scored.csv An absolute URL that designates where the file is written.

The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv.

The http adapter can be used for writing data to pre-signed URLs from either S3, Azure, or GCP.

JDBC write

DataRobot supports writing prediction results back to a JDBC data source. For this, the Batch Prediction API integrates with external data sources using securely stored credentials.

Use outputSettings to supply data source and result write back details:

You will need to supply details about your data source and how to write back the results in the outputSettings as follows:

Parameter Example Description
type jdbc Use a JDBC data store as output.
dataStoreId 5e4bc5b35e6e763beb9db14a The external data source ID.
credentialId 5e4bc5555e6e763beb9db147 The ID of a stored credential containing username and password. Refer to storing credentials securely.
table scoring_data Optional. The name of the database table where scored data will be written.
schema public Optional. The name of the schema where scored data will be written.
statementType update The statement type, either insert, update, or insert_update.
createTableIfNotExists true Optional. If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
updateColumns ['index'] A list of strings containing the column names to be updated when statementType is set to update or insert_update.
whereColumns ['refId'] A list of strings containing the column names to be selected when statementType is set to update or insert.
commitInterval 600 Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing.

Note

Note that if your target database does not support the naming conventions of our generated output format column names, you can utilize the Column Name Remapping functionality to re-write the output column name to some form your target database supports. (E.g. remove spaces from the name)

Statement types

Statement type Description
insert Scored data will be saved using INSERT queries. Suitable for writing to an empty table.
update Scored data will be saved using UPDATE queries. Suitable for writing to an existing table. Only those columns identified in updateColumns will be updated.
insert_update Scored data will be saved using either INSERT or UPDATE queries, depending on whether the columns in whereColumns have a matching row to update.
create_table Deprecation warning: create_table is no longer recommended. Use a different option with createTableIfNotExists set to True. If used, scored data will be saved to a new table using INSERT queries. The table must not exist before scoring.

Source IP addresses for whitelisting

Any connection initiated from DataRobot originates from one of the following IP addresses:

Host: https://app.datarobot.com Host: https://app.eu.datarobot.com
100.26.66.209 18.200.151.211
54.204.171.181 18.200.151.56
54.145.89.18 18.200.151.43
54.147.212.247 54.78.199.18
18.235.157.68 54.78.189.139
3.211.11.187 54.78.199.173
3.214.131.132
3.89.169.252

These are reserved for DataRobot use only.

Snowflake write

Supply data source details in the outputSettings as follows:

Parameter Example Description
type snowflake Adapter type.
dataStoreId 5e4bc5b35e6e763beb9db14a ID of Snowflake data source.
externalStage my_s3_stage Name of the Snowflake external stage.
table RESULTS Name of the Snowflake table to store results.
schema PUBLIC Optional. The name of the schema containing the table to be scored.
credentialId 5e4bc5555e6e763beb9db147 ID of a stored credential containing username and password for Snowflake.
cloudStorageType s3 Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3/azure/gcp. Default is s3
cloudStorageCredentialId 6e4bc5541e6e763beb9db15c ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage.
createTableIfNotExists true Optional. If no existing table is detected, attempt to create one.

Refer to the this sample use case for a complete example.

Synapse write

To use Synapse for scoring, supply data source details in the outputSettings as follows:

Parameter Example Description
type synapse Adapter type.
dataStoreId 5e4bc5b35e6e763beb9db14a ID of Synapse data source.
externalDatasource my_data_source Name of the Synapse external data source.
table RESULTS Name of the Synapse table to keep results in.
schema dbo Optional. Name of the schema containing the table to be scored.
credentialId 5e4bc5555e6e763beb9db147 ID of a stored credential containing username and password for Synapse.
cloudStorageCredentialId 6e4bc5541e6e763beb9db15c ID of a stored credential for Azure Blob storage.
createTableIfNotExists true Optional. Attempt to create the table first if no existing one is detected.

Refer to the this sample use case for a complete example.

Note

Synapse supports fewer collations than the default Microsoft SQL Server. For more information, reference the Synapse documentation.

BigQuery write

To use BigQuery for scoring, supply data source details in the outputSettings as follows:

Parameter Example Description
type bigquery Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
dataset my_dataset The BigQuery dataset to use.
table my_table The BigQuery table from the dataset to use for output.
bucket my-bucket-in-gcs The bucket from where data should be loaded.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are necessary (otherwise optional). Refer to storing credentials securely.

Refer to the this sample use case for a complete example.


Updated November 30, 2021
Back to top