Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Prediction output options

You can configure a prediction destination using the Predictions > Job Definitions tab or the Batch Prediction API. This topic describes both the UI and API output options.

Note

For a complete list of supported output options, see the data sources supported for batch predictions.

Output option Description
Local file streaming Stream scored data through a URL endpoint for immediate download when the job moves to a running state.
HTTP write Stream data to write to an absolute URL for scoring. This option can write data to pre-signed URLs for Amazon S3, Azure, and Google Cloud Platform.
Database connections
JDBC write Write prediction results back to a JDBC data source with data destination details supplied through a job definition or the Batch Prediction API.
SAP Datasphere write Write prediction results back to a SAP Datasphere data source with data destination details supplied through a job definition or the Batch Prediction API.
Cloud storage connections
Azure Blob Storage write Write scored data to Azure Blob Storage with a DataRobot credential consisting of an Azure Connection String.
Google Cloud Storage write Write scored data to Google Cloud Storage with a DataRobot credential consisting of a JSON-formatted account key.
Amazon S3 write Write scored data to public or private S3 buckets with a DataRobot credential consisting of an access key (ID and key) and a session token (Optional)
Data warehouse connections
BigQuery write Score data using BigQuery with data destination details supplied through a job definition or the Batch Prediction API.
Snowflake write Score data using Snowflake with data destination details supplied through a job definition or the Batch Prediction API.
Azure Synapse write Score data using Synapse with data destination details supplied through a job definition or the Batch Prediction API.

If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.

Local file streaming

If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING state. In the example job data JSON below, the URL needed to make the local file streaming request is available in the download key of the links object:

{
  "elapsedTimeSec": 97,
  "failedRows": 0,
  "jobIntakeSize": 1150602342,
  "jobOutputSize": 107791140,
  "jobSpec": {
    "deploymentId": "5dc1a6a9865d6c004dd881ef",
    "maxExplanations": 0,
    "numConcurrent": 4,
    "passthroughColumns": null,
    "passthroughColumnsSet": null,
    "predictionWarningEnabled": null,
    "thresholdHigh": null,
    "thresholdLow": null
  },
  "links": {
    "download": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/",
    "self": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/"
  },
  "logs": [
    "Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000",
    "Job started processing at 2019-11-07 18:11:49.781000",
    "Job done processing at 2019-11-07 18:13:14.533000"
  ],
  "percentageCompleted": 0.0,
  "scoredRows": 3000000,
  "status": "COMPLETED",
  "statusDetails": "Job done processing at 2019-11-07 18:13:14.533000"
}

If you download faster than DataRobot can ingest and score your data, the download may appear sluggish because DataRobot streams the scored data as soon as it arrives (in chunks).

Refer to the this sample use case for a complete example.

HTTP write

You can point Batch Predictions at a regular URL, and DataRobot streams the data:

Parameter Example Description
type http Use HTTP for output.
url https://example.com/datasets/scored.csv An absolute URL that designates where the file is written.

The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv.

The http adapter can be used for writing data to pre-signed URLs from either S3, Azure, or GCP.

JDBC write

DataRobot supports writing prediction results back to a JDBC data source. For this, the Batch Prediction API integrates with external data sources using securely stored credentials.

Supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type jdbc Use a JDBC data store as output.
Data connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a The external data source ID.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) The ID of a stored credential. Refer to storing credentials securely.
Schemas schema public (Optional) The name of the schema where scored data will be written.
Tables table scoring_data The name of the database table where scored data will be written.
Database catalog output_data (Optional) The name of the specified database catalog to write output data to.
Write strategy options
Write strategy statementType update The statement type, insert, update, or insertUpdate.
Create table if it does not exist
(for Insert or Insert + Update)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Row identifier
(for Update or Insert + Update)
updateColumns ['index'] (Optional) A list of strings containing the column names to be updated when statementType is set to update or insertUpdate.
Row identifier
(for Update or Insert + Update)
where_columns ['refId'] (Optional) A list of strings containing the column names to be selected when statementType is set to update or insertUpdate.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Note

If your target database doesn't support the column naming conventions of DataRobot's output format, you can use Column Name Remapping to re-write the output column names to a format your target database supports (e.g., remove spaces from the name).

Statement types

When dealing with Write strategy options, you can use the following statement types to write data, depending on the situation:

Statement type Description
insert Scored data rows are inserted in the target database as a new entry. Suitable for writing to an empty table.
update Scored data entries in the target database matching the row identifier of a result row are updated with the new result (columns identified in updateColumns). Suitable for writing to an existing table.
insertUpdate Entries in the target database matching the row identifier of a result row (where_columns) are updated with the new result (update queries). All other result rows are inserted as new entries (insert queries).
createTable (deprecated) DataRobot no longer recommends createTable. Use a different option with create_table_if_not_exists set to True. If used, scored data rows are saved to a new table using INSERT queries. The table must not exist before scoring.

Allowed source IP addresses

Any connection initiated from DataRobot originates from one of the following IP addresses:

Host: https://app.datarobot.com Host: https://app.eu.datarobot.com Host: https://app.jp.datarobot.com
100.26.66.209 18.200.151.211 52.199.145.51
54.204.171.181 18.200.151.56 52.198.240.166
54.145.89.18 18.200.151.43 52.197.6.249
54.147.212.247 54.78.199.18
18.235.157.68 54.78.189.139
3.211.11.187 54.78.199.173
52.1.228.155 18.200.127.104
3.224.51.250 34.247.41.18
44.208.234.185 99.80.243.135
3.214.131.132 63.34.68.62
3.89.169.252 34.246.241.45
3.220.7.239 52.48.20.136
52.44.188.255
3.217.246.191

Note

These IP addresses are reserved for DataRobot use only.

SAP Datasphere write

Premium

Support for SAP Datasphere is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.

Feature flag(s): Enable SAP Datasphere Connector, Enable SAP Datasphere Batch Predictions Integration

To use SAP Datasphere, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Source type type datasphere Use a SAP Datasphere database for output.
Data connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a The ID of an external data source. In the UI, select a data connection or click add a new data connection. Refer to the SAP Datasphere connection documentation.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 The ID of a stored credential for Datasphere. Refer to storing credentials securely.
catalog / The name of the database catalog containing the table to write to.
Schemas schema public The name of the database schema containing the table to write to.
Tables table scoring_data The name of the database table containing data to write to. In the UI, select a table or click Create a table.

Azure Blob Storage write

Azure Blob Storage is an option for writing large files. To save a dataset to Azure Blob Storage, you must set up a credential with DataRobot consisting of an Azure Connection String.

UI field Parameter Example Description
Destination type type azure Use Azure Blob Storage for output.
URL url https://myaccount.blob.core.windows.net/datasets/scored.csv An absolute URL for the file to be written.
Format format csv (Optional) Select CSV (csv) or Parquet (parquet).
Default value: CSV
+ Add credentials credentialId 5e4bc5555e6e763beb488dba In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access to credentials for this URL are necessary (optional otherwise). Refer to storing credentials securely.

Azure credentials are encrypted and only decrypted when used to set up the client for communication with Azure during scoring.

Google Cloud Storage write

DataRobot supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage, you must set up a credential with DataRobot consisting of a JSON-formatted account key.

UI field Parameter Example Description
Destination type type gcp Use Google Cloud Storage for output.
URL url gcs://bucket-name/datasets/scored.csv An absolute URL designating where the file is written.
Format format csv (Optional) Select CSV (csv) or Parquet (parquet).
Default value: CSV
+ Add credentials credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this URL are required, otherwise (Optional) Refer to storing credentials securely.

GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.

Amazon S3 write

DataRobot can save scored data to both public and private buckets. To write to S3, you must set up a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.

UI field Parameter Example Description
Destination type type s3 Use S3 for output.
URL url s3://bucket-name/results/scored.csv An absolute URL for the file to be written. DataRobot only supports directory scoring when scoring from cloud to cloud. Provide a directory in S3 (or another cloud provider) for the input and a directory ending with / for the output. Using this configuration, all files in the input directory are scored and the results are written to the output directory with the original filenames. When a single file is specified for both the input and the output, the file is overwritten each time the job runs. If you do not wish to overwrite the file, specify a filename template such as s3://bucket-name/results/scored_{{ current_run_time }}.csv. You can review template variable definitions in the documentation.
Format format csv (Optional) Select CSV (csv) or Parquet (parquet).
Default value: CSV
+ Add credentials credentialId 5e4bc5555e6e763beb9db147 In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access credentials for this URL are required. Refer to storing credentials securely.
Advanced options
Endpoint URL endpointUrl https://s3.us-east-1.amazonaws.com (Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service.

AWS credentials are encrypted and only decrypted when used to set up the client for communication with AWS during scoring.

Note

If running a Private AI Cloud within AWS, you can provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS article, Create an IAM Instance Profile.

BigQuery write

To use BigQuery, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type bigquery Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
Dataset dataset my_dataset The BigQuery dataset to use.
Table table my_table The BigQuery table from the dataset to use for output.
Bucket name bucket my-bucket-in-gcs The GCP bucket where data files are stored to be loaded into or unloaded from a BiqQuery table.
+ Add credentials credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are necessary (otherwise optional). In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely.

BigQuery output write strategy

The write strategy for BigQuery output is insert. First, the output adapter checks if a BigQuery table exists. If a table exists, the data is inserted. If a table doesn't exist, a table is created and then the data is inserted.

Refer to the example section for a complete API example.

Snowflake write

To use Snowflake, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type snowflake Adapter type.
Data connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a ID of Snowflake data source.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) The ID of a stored credential for Snowflake.
Tables table RESULTS Name of the Snowflake table to store results.
Schemas schema PUBLIC (Optional) The name of the schema containing the table to be scored.
Database catalog OUTPUT (Optional) The name of the specified database catalog to write output data to.
Use external stage options
Cloud storage type cloudStorageType s3 (Optional) Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3/azure/gcp. The default is s3. In the UI, select Use external stage to enable the Cloud storage type field.
External stage externalStage my_s3_stage Snowflake external stage. In the UI, select Use external stage to enable the External stage field.
Endpoint URL (for S3 only) endpointUrl https://www.example.com/datasets/ (Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. In the UI, for the S3 option in Cloud storage type click Show advanced options to reveal the Endpoint URL field.
+ Add credentials cloudStorageCredentialId 6e4bc5541e6e763beb9db15c (Optional) ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage. In the UI, enable the + Add credentials field by selecting This URL requires credentials.
Write strategy options (for fallback JDBC connection)
Write strategy statementType insert If you're using a Snowflake external stage the statementType is insert. However, in the UI you have two configuration options:
  • If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert or Update. If you select Update, you can provide a Row identifier.
  • If you selected Use external stage, the Insert option is required.
Create table if it does not exist
(for Insert)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create one.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Refer to the example section for a complete API example.

Azure Synapse write

To use Azure Synapse, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type synapse Adapter type.
Data connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a ID of Synapse data source.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) The ID of a stored credential for Synapse.
Tables table RESULTS Name of the Synapse table to keep results in.
Schemas schema dbo (Optional) Name of the schema containing the table to be scored.
Use external stage options
External data source externalDatasource my_data_source Name of the identifier created in Synapse for the external data source.
+ Add credentials cloudStorageCredentialId 6e4bc5541e6e763beb9db15c (Optional) ID of a stored credential for Azure Blob storage.
Write strategy options (for fallback JDBC connection)
Write strategy statementType insert If you're using a Synapse external stage the statementType is insert. However, in the UI you have two configuration options:
  • If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert, Update, or Insert + Update. If you select Update or Insert + Update, you can provide a Row identifier.
  • If you selected Use external stage, the Insert option is required.
Create table if it does not exist
(for Insert or Insert + Update)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Create table if it does not exist create_table_if_not_exists true (Optional) Attempt to create the table first if no existing one is detected.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Refer to the example section for a complete API example.


Updated October 18, 2024