Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Prediction output options

You can configure a prediction destination using the Predictions > Job Definitions tab or the Batch Prediction API. This topic describes both the UI and API output options.

Note

For a complete list of supported output options, see the data sources supported for batch predictions.

Output option Description
Local file streaming Stream scored data through a URL endpoint for immediate download when the job moves to a running state.
HTTP write Stream data to write to an absolute URL for scoring. This option can write data to pre-signed URLs for Amazon S3, Azure, and Google Cloud Platform.
Database connections
JDBC write Write prediction results back to a JDBC data source with data destination details supplied through a job definition or the Batch Prediction API.
Cloud storage connections
Amazon S3 write Write scored data to public or private S3 buckets with a DataRobot credential consisting of an access key (ID and key) and a session token (Optional)
Azure Blob Storage write Write scored data to Azure Blob Storage with a DataRobot credential consisting of an Azure Connection String.
Google Cloud Storage write Write scored data to Google Cloud Storage with a DataRobot credential consisting of a JSON-formatted account key.
Data warehouse connections
BigQuery write Score data using BigQuery with data destination details supplied through a job definition or the Batch Prediction API.
Snowflake write Score data using Snowflake with data destination details supplied through a job definition or the Batch Prediction API.
Azure Synapse write Score data using Synapse with data destination details supplied through a job definition or the Batch Prediction API.
Other connections
Tableau write Score data using Tableau with data destination details supplied through the Batch Prediction API.

If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.

Local file streaming

If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING state. In the example job data JSON below, the URL needed to make the local file streaming request is available in the download key of the links object:

{
  "elapsedTimeSec": 97,
  "failedRows": 0,
  "jobIntakeSize": 1150602342,
  "jobOutputSize": 107791140,
  "jobSpec": {
    "deploymentId": "5dc1a6a9865d6c004dd881ef",
    "maxExplanations": 0,
    "numConcurrent": 4,
    "passthroughColumns": null,
    "passthroughColumnsSet": null,
    "predictionWarningEnabled": null,
    "thresholdHigh": null,
    "thresholdLow": null
  },
  "links": {
    "download": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/",
    "self": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/"
  },
  "logs": [
    "Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000",
    "Job started processing at 2019-11-07 18:11:49.781000",
    "Job done processing at 2019-11-07 18:13:14.533000"
  ],
  "percentageCompleted": 0.0,
  "scoredRows": 3000000,
  "status": "COMPLETED",
  "statusDetails": "Job done processing at 2019-11-07 18:13:14.533000"
}

If you download faster than DataRobot can ingest and score your data, the download may appear sluggish because DataRobot streams the scored data as soon as it arrives (in chunks).

Refer to the this sample use case for a complete example.

HTTP write

You can point Batch Predictions at a regular URL, and DataRobot streams the data for scoring:

Parameter Example Description
type http Use HTTP for output.
url https://example.com/datasets/scored.csv An absolute URL that designates where the file is written.

The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv.

The http adapter can be used for writing data to pre-signed URLs from either S3, Azure, or GCP.

JDBC write

DataRobot supports writing prediction results back to a JDBC data source. For this, the Batch Prediction API integrates with external data sources using securely stored credentials.

Supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type jdbc Use a JDBC data store as output.
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a The external data source ID.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) The ID of a stored credential containing username and password. Refer to storing credentials securely.
Tables table scoring_data The name of the database table where scored data will be written.
Schemas schema public (Optional) The name of the schema where scored data will be written.
Database catalog output_data (Optional) The name of the specified database catalog to write output data to.
Write strategy options
Write strategy statementType update The statement type, insert, update, or insertUpdate.
Create table if it does not exist
(for Insert or Insert + Update)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Row identifier
(for Update or Insert + Update)
updateColumns ['index'] (Optional) A list of strings containing the column names to be updated when statementType is set to update or insertUpdate.
Row identifier
(for Update or Insert + Update)
where_columns ['refId'] (Optional) A list of strings containing the column names to be selected when statementType is set to update or insertUpdate.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Note

If your target database doesn't support the column naming conventions of DataRobot's output format, you can use Column Name Remapping to re-write the output column names to a format your target database supports (e.g., remove spaces from the name).

Statement types

When dealing with Write strategy options, you can use the following statement types to write data, depending on the situation:

Statement type Description
insert Scored data rows are inserted in the target database as a new entry. Suitable for writing to an empty table.
update Scored data entries in the target database matching the row identifier of a result row are updated with the new result (columns identified in updateColumns). Suitable for writing to an existing table.
insertUpdate Entries in the target database matching the row identifier of a result row (where_columns) are updated with the new result (update queries). All other result rows are inserted as new entries (insert queries).
createTable (deprecated) DataRobot no longer recommends createTable. Use a different option with create_table_if_not_exists set to True. If used, scored data rows are saved to a new table using INSERT queries. The table must not exist before scoring.

Allowed source IP addresses

Any connection initiated from DataRobot originates from one of the following IP addresses:

Host: https://app.datarobot.com Host: https://app.eu.datarobot.com
100.26.66.209 18.200.151.211
54.204.171.181 18.200.151.56
54.145.89.18 18.200.151.43
54.147.212.247 54.78.199.18
18.235.157.68 54.78.189.139
3.211.11.187 54.78.199.173
52.1.228.155 18.200.127.104
3.224.51.250 34.247.41.18
44.208.234.185 99.80.243.135
3.214.131.132 63.34.68.62
3.89.169.252 34.246.241.45
3.220.7.239 52.48.20.136
52.44.188.255
3.217.246.191

Note

These IP addresses are reserved for DataRobot use only.

Amazon S3 write

DataRobot can save scored data to both public and private buckets. To write to S3, you must set up a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.

UI field Parameter Example Description
Destination type type s3 Use S3 for intake.
URL url s3://bucket-name/results/scored.csv An absolute URL for the file to be written.
Format format csv CSV (default) or Parquet.
+ Add credentials credentialId 5e4bc5555e6e763beb9db147 In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access credentials for this URL are required. Refer to storing credentials securely.
Advanced options
Endpoint URL endpointUrl https://s3.us-east-1.amazonaws.com (Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service.

AWS credentials are encrypted and only decrypted when used to set up the client for communication with AWS during scoring.

Note

If running a Private AI Cloud within AWS, you can provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS article, Create an IAM Instance Profile.

Azure Blob Storage write

Azure Blob Storage is an option for scoring large files. To save a dataset to Azure Blob Storage, you must set up a credential with DataRobot consisting of an Azure Connection String.

UI field Parameter Example Description
Destination type type azure Use Azure Blob Storage for intake.
URL url https://myaccount.blob.core.windows.net/datasets/scored.csv An absolute URL for the file to be written.
Format format csv (Optional) CSV (default) or Parquet.
+ Add credentials credentialId 5e4bc5555e6e763beb488dba In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access to credentials for this URL are necessary (optional otherwise). Refer to storing credentials securely.

Azure credentials are encrypted and only decrypted when used to set up the client for communication with Azure during scoring.

Google Cloud Storage write

DataRobot also supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage, you must set up a credential with DataRobot consisting of a JSON-formatted account key.

UI field Parameter Example Description
Destination type type gcp Use Google Cloud Storage for output.
URL url gcs://bucket-name/datasets/scored.csv An absolute URL designating where the file is written.
Format format csv (Optional) CSV (default) or Parquet.
+ Add credentials credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this URL are required, otherwise (Optional) Refer to storing credentials securely.

GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.

BigQuery write

To use BigQuery for scoring, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type bigquery Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
Dataset dataset my_dataset The BigQuery dataset to use.
Table table my_table The BigQuery table from the dataset to use for output.
Bucket name bucket my-bucket-in-gcs The GCP bucket where data files are stored to be loaded into or unloaded from a BiqQuery table.
+ Add credentials credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are necessary (otherwise optional). In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely.

Refer to the example section for a complete API example.

Snowflake write

To use Snowflake for scoring, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type snowflake Adapter type.
Connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a ID of Snowflake data source.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) ID of a stored credential containing username and password for Snowflake.
Tables table RESULTS Name of the Snowflake table to store results.
Schemas schema PUBLIC (Optional) The name of the schema containing the table to be scored.
Database catalog OUTPUT (Optional) The name of the specified database catalog to write output data to.
Use external stage options
Cloud storage type cloudStorageType s3 (Optional) Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3/azure/gcp. The default is s3. In the UI, select Use external stage to enable the Cloud storage type field.
External stage externalStage my_s3_stage Snowflake external stage. In the UI, select Use external stage to enable the External stage field.
Endpoint URL (for S3 only) endpointUrl https://www.example.com/datasets/ (Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. In the UI, for the S3 option in Cloud storage type click Show advanced options to reveal the Endpoint URL field.
+ Add credentials cloudStorageCredentialId 6e4bc5541e6e763beb9db15c (Optional) ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage. In the UI, enable the + Add credentials field by selecting This URL requires credentials.
Write strategy options (for fallback JDBC connection)
Write strategy statementType insert If you're using a Snowflake external stage the statementType is insert. However, in the UI you have two configuration options:
  • If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert or Update. If you select Update, you can provide a Row identifier.
  • If you selected Use external stage, the Insert option is required.
Create table if it does not exist
(for Insert)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create one.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Refer to the example section for a complete API example.

Azure Synapse write

To use Synapse for scoring, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type synapse Adapter type.
Connection options
+ Select connection dataStoreId 5e4bc5b35e6e763beb9db14a ID of Synapse data source.
Enter credentials credentialId 5e4bc5555e6e763beb9db147 (Optional) ID of a stored credential containing username and password for Synapse.
Tables table RESULTS Name of the Synapse table to keep results in.
Schemas schema dbo (Optional) Name of the schema containing the table to be scored.
Use external stage options
External data source externalDatasource my_data_source Name of the identifier created in Synapse for the external data source.
+ Add credentials cloudStorageCredentialId 6e4bc5541e6e763beb9db15c (Optional) ID of a stored credential for Azure Blob storage.
Write strategy options (for fallback JDBC connection)
Write strategy statementType insert If you're using a Synapse external stage the statementType is insert. However, in the UI you have two configuration options:
  • If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert, Update, or Insert + Update. If you select Update or Insert + Update, you can provide a Row identifier.
  • If you selected Use external stage, the Insert option is required.
Create table if it does not exist
(for Insert or Insert + Update)
create_table_if_not_exists true (Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Create table if it does not exist create_table_if_not_exists true (Optional) Attempt to create the table first if no existing one is detected.
Advanced options
Commit interval commitInterval 600 (Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600

Refer to the example section for a complete API example.

Note

Azure Synapse supports fewer collations than the default Microsoft SQL Server. For more information, reference the Azure Synapse documentation.

Tableau write

To use Tableau for scoring, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings) as described in the table below.

UI field Parameter Example Description
Destination type type tableau Use Tableau for output.
Tableau URL URL https://xxxx.online.tableau.com The URL to your online Tableau server.
Site Name siteName datarobottrial Your Tableau site name.
+ Add credentials credentialId 5e4bc5555e6e763beb488dba Use the specified credential to access the Tableau URL. In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely.
+ Select Tableau project and data source dataSourceId 0e470cc1-8178-4e8d-b159-6ae1db202394 The ID of your Tableau data source. In the UI, select Create a new data source and add a new Data source name. Alternatively, select Use existing data source.
Output options overwrite true Specify true to overwrite the dataset or false to append to the dataset. In the UI, select Create a new data source and select Overwrite or Append.

Updated February 16, 2024