DataRobot API resources > API reference documentation > Batch Prediction API > Prediction output options

Prediction output options¶

You can configure a prediction destination using the Predictions > Job Definitions tab or the Batch Prediction API. This topic describes both the UI and API output options.

Note

For a complete list of supported output options, see the data sources supported for batch predictions.

Output option	Description
Local file streaming	Stream scored data through a URL endpoint for immediate download when the job moves to a running state.
HTTP write	Stream data to write to an absolute URL for scoring. This option can write data to pre-signed URLs for Amazon S3, Azure, and Google Cloud Platform.
Database connections
JDBC write	Write prediction results back to a JDBC data source with data destination details supplied through a job definition or the Batch Prediction API.
SAP Datasphere write	Write prediction results back to a SAP Datasphere data source with data destination details supplied through a job definition or the Batch Prediction API.
Cloud storage connections
Azure Blob Storage write	Write scored data to Azure Blob Storage with a DataRobot credential consisting of an Azure Connection String.
Google Cloud Storage write	Write scored data to Google Cloud Storage with a DataRobot credential consisting of a JSON-formatted account key.
Amazon S3 write	Write scored data to public or private S3 buckets with a DataRobot credential consisting of an access key (ID and key) and a session token (Optional)
Data warehouse connections
BigQuery write	Score data using BigQuery with data destination details supplied through a job definition or the Batch Prediction API.
Snowflake write	Score data using Snowflake with data destination details supplied through a job definition or the Batch Prediction API.
Azure Synapse write	Score data using Synapse with data destination details supplied through a job definition or the Batch Prediction API.

If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.

Local file streaming¶

If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING state. In the example job data JSON below, the URL needed to make the local file streaming request is available in the download key of the links object:


{
  "elapsedTimeSec": 97,
  "failedRows": 0,
  "jobIntakeSize": 1150602342,
  "jobOutputSize": 107791140,
  "jobSpec": {
    "deploymentId": "5dc1a6a9865d6c004dd881ef",
    "maxExplanations": 0,
    "numConcurrent": 4,
    "passthroughColumns": null,
    "passthroughColumnsSet": null,
    "predictionWarningEnabled": null,
    "thresholdHigh": null,
    "thresholdLow": null
  },
  "links": {
    "download": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/",
    "self": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/"
  },
  "logs": [
    "Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000",
    "Job started processing at 2019-11-07 18:11:49.781000",
    "Job done processing at 2019-11-07 18:13:14.533000"
  ],
  "percentageCompleted": 0.0,
  "scoredRows": 3000000,
  "status": "COMPLETED",
  "statusDetails": "Job done processing at 2019-11-07 18:13:14.533000"
}

If you download faster than DataRobot can ingest and score your data, the download may appear sluggish because DataRobot streams the scored data as soon as it arrives (in chunks).

Refer to the this sample use case for a complete example.

HTTP write¶

You can point Batch Predictions at a regular URL, and DataRobot streams the data:

Parameter	Example	Description
`type`	`http`	Use HTTP for output.
`url`	`https://example.com/datasets/scored.csv`	An absolute URL that designates where the file is written.

The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv.

The http adapter can be used for writing data to pre-signed URLs from either S3, Azure, or GCP.

JDBC write¶

DataRobot supports writing prediction results back to a JDBC data source. For this, the Batch Prediction API integrates with external data sources using securely stored credentials.

Supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field	Parameter	Example	Description
Destination type	`type`	`jdbc`	Use a JDBC data store as output.
Data connection options
+ Select connection	`dataStoreId`	`5e4bc5b35e6e763beb9db14a`	The external data source ID.
Enter credentials	`credentialId`	`5e4bc5555e6e763beb9db147`	(Optional) The ID of a stored credential. Refer to storing credentials securely.
Schemas	`schema`	`public`	(Optional) The name of the schema where scored data will be written.
Tables	`table`	`scoring_data`	The name of the database table where scored data will be written.
Database	`catalog`	`output_data`	(Optional) The name of the specified database catalog to write output data to.
Write strategy options
Write strategy	`statementType`	`update`	The statement type, `insert`, `update`, or `insertUpdate`.
Create table if it does not exist (for Insert or Insert + Update)	`create_table_if_not_exists`	`true`	(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the `statementType` parameter.
Row identifier (for Update or Insert + Update)	`updateColumns`	`['index']`	(Optional) A list of strings containing the column names to be updated when `statementType` is set to `update` or `insertUpdate`.
Row identifier (for Update or Insert + Update)	`where_columns`	`['refId']`	(Optional) A list of strings containing the column names to be selected when `statementType` is set to `update` or `insertUpdate`.
Advanced options
Commit interval	`commitInterval`	`600`	(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to `0`, the batch prediction operation will write the entire job before committing. Default: `600`

Note

If your target database doesn't support the column naming conventions of DataRobot's output format, you can use Column Name Remapping to re-write the output column names to a format your target database supports (e.g., remove spaces from the name).

Statement types¶

When dealing with Write strategy options, you can use the following statement types to write data, depending on the situation:

Statement type	Description
`insert`	Scored data rows are inserted in the target database as a new entry. Suitable for writing to an empty table.
`update`	Scored data entries in the target database matching the row identifier of a result row are updated with the new result (columns identified in `updateColumns`). Suitable for writing to an existing table.
`insertUpdate`	Entries in the target database matching the row identifier of a result row (`where_columns`) are updated with the new result (`update` queries). All other result rows are inserted as new entries (`insert` queries).
`createTable` (deprecated)	DataRobot no longer recommends `createTable`. Use a different option with `create_table_if_not_exists` set to `True`. If used, scored data rows are saved to a new table using `INSERT` queries. The table must not exist before scoring.

Allowed source IP addresses¶

Any connection initiated from DataRobot originates from one of the following IP addresses:

Host: https://app.datarobot.com	Host: https://app.eu.datarobot.com	Host: https://app.jp.datarobot.com
100.26.66.209	18.200.151.211	52.199.145.51
54.204.171.181	18.200.151.56	52.198.240.166
54.145.89.18	18.200.151.43	52.197.6.249
54.147.212.247	54.78.199.18
18.235.157.68	54.78.189.139
3.211.11.187	54.78.199.173
52.1.228.155	18.200.127.104
3.224.51.250	34.247.41.18
44.208.234.185	99.80.243.135
3.214.131.132	63.34.68.62
3.89.169.252	34.246.241.45
3.220.7.239	52.48.20.136
52.44.188.255
3.217.246.191

Note

These IP addresses are reserved for DataRobot use only.

SAP Datasphere write¶

Premium

Support for SAP Datasphere is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.

Feature flag(s): Enable SAP Datasphere Connector, Enable SAP Datasphere Batch Predictions Integration

To use SAP Datasphere, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field	Parameter	Example	Description
Source type	`type`	`datasphere`	Use a SAP Datasphere database for output.
Data connection options
+ Select connection	`dataStoreId`	`5e4bc5b35e6e763beb9db14a`	The ID of an external data source. In the UI, select a data connection or click add a new data connection. Refer to the SAP Datasphere connection documentation.
Enter credentials	`credentialId`	`5e4bc5555e6e763beb9db147`	The ID of a stored credential for Datasphere. Refer to storing credentials securely.
	`catalog`	`/`	The name of the database catalog containing the table to write to.
Schemas	`schema`	`public`	The name of the database schema containing the table to write to.
Tables	`table`	`scoring_data`	The name of the database table containing data to write to. In the UI, select a table or click Create a table.

Azure Blob Storage write¶

Azure Blob Storage is an option for writing large files. To save a dataset to Azure Blob Storage, you must set up a credential with DataRobot consisting of an Azure Connection String.

UI field	Parameter	Example	Description
Destination type	`type`	`azure`	Use Azure Blob Storage for output.
URL	`url`	`https://myaccount.blob.core.windows.net/datasets/scored.csv`	An absolute URL for the file to be written.
Format	`format`	`csv`	(Optional) Select CSV (`csv`) or Parquet (`parquet`). Default value: CSV
+ Add credentials	credentialId	`5e4bc5555e6e763beb488dba`	In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access to credentials for this URL are necessary (optional otherwise). Refer to storing credentials securely.

Azure credentials are encrypted and only decrypted when used to set up the client for communication with Azure during scoring.

Google Cloud Storage write¶

DataRobot supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage, you must set up a credential with DataRobot consisting of a JSON-formatted account key.

UI field	Parameter	Example	Description
Destination type	`type`	`gcp`	Use Google Cloud Storage for output.
URL	`url`	`gcs://bucket-name/datasets/scored.csv`	An absolute URL designating where the file is written.
Format	`format`	`csv`	(Optional) Select CSV (`csv`) or Parquet (`parquet`). Default value: CSV
+ Add credentials	`credentialId`	`5e4bc5555e6e763beb488dba`	Required if explicit access credentials for this URL are required, otherwise (Optional) Refer to storing credentials securely.

GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.

Amazon S3 write¶

DataRobot can save scored data to both public and private buckets. To write to S3, you must set up a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.

UI field	Parameter	Example	Description
Destination type	`type`	`s3`	Use S3 for output.
URL	`url`	`s3://bucket-name/results/scored.csv`	An absolute URL for the file to be written. DataRobot only supports directory scoring when scoring from cloud to cloud. Provide a directory in S3 (or another cloud provider) for the input and a directory ending with `/` for the output. Using this configuration, all files in the input directory are scored and the results are written to the output directory with the original filenames. When a single file is specified for both the input and the output, the file is overwritten each time the job runs. If you do not wish to overwrite the file, specify a filename template such as `s3://bucket-name/results/scored_{{ current_run_time }}.csv`. You can review template variable definitions in the documentation.
Format	`format`	`csv`	(Optional) Select CSV (`csv`) or Parquet (`parquet`). Default value: CSV
+ Add credentials	`credentialId`	`5e4bc5555e6e763beb9db147`	In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access credentials for this URL are required. Refer to storing credentials securely.
Advanced options
Endpoint URL	`endpointUrl`	`https://s3.us-east-1.amazonaws.com`	(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service.

AWS credentials are encrypted and only decrypted when used to set up the client for communication with AWS during scoring.

Note

If running a Private AI Cloud within AWS, you can provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS article, Create an IAM Instance Profile.

BigQuery write¶

To use BigQuery, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field	Parameter	Example	Description
Destination type	`type`	`bigquery`	Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
Dataset	`dataset`	`my_dataset`	The BigQuery dataset to use.
Table	`table`	`my_table`	The BigQuery table from the dataset to use for output.
Bucket name	`bucket`	`my-bucket-in-gcs`	The GCP bucket where data files are stored to be loaded into or unloaded from a BiqQuery table.
+ Add credentials	`credentialId`	`5e4bc5555e6e763beb488dba`	Required if explicit access credentials for this bucket are necessary (otherwise optional). In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely.

BigQuery output write strategy

The write strategy for BigQuery output is insert. First, the output adapter checks if a BigQuery table exists. If a table exists, the data is inserted. If a table doesn't exist, a table is created and then the data is inserted.

Refer to the example section for a complete API example.

Snowflake write¶

To use Snowflake, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field	Parameter	Example	Description
Destination type	`type`	`snowflake`	Adapter type.
Data connection options
+ Select connection	`dataStoreId`	`5e4bc5b35e6e763beb9db14a`	ID of Snowflake data source.
Enter credentials	`credentialId`	`5e4bc5555e6e763beb9db147`	(Optional) The ID of a stored credential for Snowflake.
Tables	`table`	`RESULTS`	Name of the Snowflake table to store results.
Schemas	`schema`	`PUBLIC`	(Optional) The name of the schema containing the table to be scored.
Database	`catalog`	`OUTPUT`	(Optional) The name of the specified database catalog to write output data to.
Use external stage options
Cloud storage type	`cloudStorageType`	`s3`	(Optional) Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: `s3`/`azure`/`gcp`. The default is `s3`. In the UI, select Use external stage to enable the Cloud storage type field.
External stage	`externalStage`	`my_s3_stage`	Snowflake external stage. In the UI, select Use external stage to enable the External stage field.
Endpoint URL (for S3 only)	`endpointUrl`	`https://www.example.com/datasets/`	(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. In the UI, for the S3 option in Cloud storage type click Show advanced options to reveal the Endpoint URL field.
+ Add credentials	`cloudStorageCredentialId`	`6e4bc5541e6e763beb9db15c`	(Optional) ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage. In the UI, enable the + Add credentials field by selecting This URL requires credentials.
Write strategy options (for fallback JDBC connection)
Write strategy	`statementType`	`insert`	If you're using a Snowflake external stage the `statementType` is `insert`. However, in the UI you have two configuration options: If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert or Update. If you select Update, you can provide a Row identifier. If you selected Use external stage, the Insert option is required.
Create table if it does not exist (for Insert)	`create_table_if_not_exists`	`true`	(Optional) If no existing table is detected, attempt to create one.
Advanced options
Commit interval	`commitInterval`	`600`	(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to `0`, the batch prediction operation will write the entire job before committing. Default: `600`

Refer to the example section for a complete API example.

Azure Synapse write¶

To use Azure Synapse, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field	Parameter	Example	Description
Destination type	`type`	`synapse`	Adapter type.
Data connection options
+ Select connection	`dataStoreId`	`5e4bc5b35e6e763beb9db14a`	ID of Synapse data source.
Enter credentials	`credentialId`	`5e4bc5555e6e763beb9db147`	(Optional) The ID of a stored credential for Synapse.
Tables	`table`	`RESULTS`	Name of the Synapse table to keep results in.
Schemas	`schema`	`dbo`	(Optional) Name of the schema containing the table to be scored.
Use external stage options
External data source	`externalDatasource`	`my_data_source`	Name of the identifier created in Synapse for the external data source.
+ Add credentials	`cloudStorageCredentialId`	`6e4bc5541e6e763beb9db15c`	(Optional) ID of a stored credential for Azure Blob storage.
Write strategy options (for fallback JDBC connection)
Write strategy	`statementType`	`insert`	If you're using a Synapse external stage the `statementType` is `insert`. However, in the UI you have two configuration options: If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert, Update, or Insert + Update. If you select Update or Insert + Update, you can provide a Row identifier. If you selected Use external stage, the Insert option is required.
Create table if it does not exist (for Insert or Insert + Update)	`create_table_if_not_exists`	`true`	(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the `statementType` parameter.
Create table if it does not exist	`create_table_if_not_exists`	`true`	(Optional) Attempt to create the table first if no existing one is detected.
Advanced options
Commit interval	`commitInterval`	`600`	(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to `0`, the batch prediction operation will write the entire job before committing. Default: `600`

Refer to the example section for a complete API example.

Updated October 18, 2024

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?

Thanks for your feedback!