Prediction output options¶
You can configure a prediction destination using the Predictions > Job Definitions tab or the Batch Prediction API. This topic describes both the UI and API output options.
Note
For a complete list of supported output options, see the data sources supported for batch predictions.
Output option | Description |
---|---|
Local file streaming | Stream scored data through a URL endpoint for immediate download when the job moves to a running state. |
HTTP write | Stream data to write to an absolute URL for scoring. This option can write data to pre-signed URLs for Amazon S3, Azure, and Google Cloud Platform. |
Database connections | |
JDBC write | Write prediction results back to a JDBC data source with data destination details supplied through a job definition or the Batch Prediction API. |
SAP Datasphere write | Write prediction results back to a SAP Datasphere data source with data destination details supplied through a job definition or the Batch Prediction API. |
Cloud storage connections | |
Azure Blob Storage write | Write scored data to Azure Blob Storage with a DataRobot credential consisting of an Azure Connection String. |
Google Cloud Storage write | Write scored data to Google Cloud Storage with a DataRobot credential consisting of a JSON-formatted account key. |
Amazon S3 write | Write scored data to public or private S3 buckets with a DataRobot credential consisting of an access key (ID and key) and a session token (Optional) |
Data warehouse connections | |
BigQuery write | Score data using BigQuery with data destination details supplied through a job definition or the Batch Prediction API. |
Snowflake write | Score data using Snowflake with data destination details supplied through a job definition or the Batch Prediction API. |
Azure Synapse write | Score data using Synapse with data destination details supplied through a job definition or the Batch Prediction API. |
If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.
Local file streaming¶
If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING
state. In the example job data JSON below, the URL needed to make the local file streaming request is available in the download
key of the links
object:
{
"elapsedTimeSec": 97,
"failedRows": 0,
"jobIntakeSize": 1150602342,
"jobOutputSize": 107791140,
"jobSpec": {
"deploymentId": "5dc1a6a9865d6c004dd881ef",
"maxExplanations": 0,
"numConcurrent": 4,
"passthroughColumns": null,
"passthroughColumnsSet": null,
"predictionWarningEnabled": null,
"thresholdHigh": null,
"thresholdLow": null
},
"links": {
"download": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/",
"self": "https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/"
},
"logs": [
"Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000",
"Job started processing at 2019-11-07 18:11:49.781000",
"Job done processing at 2019-11-07 18:13:14.533000"
],
"percentageCompleted": 0.0,
"scoredRows": 3000000,
"status": "COMPLETED",
"statusDetails": "Job done processing at 2019-11-07 18:13:14.533000"
}
If you download faster than DataRobot can ingest and score your data, the download may appear sluggish because DataRobot streams the scored data as soon as it arrives (in chunks).
Refer to the this sample use case for a complete example.
HTTP write¶
You can point Batch Predictions at a regular URL, and DataRobot streams the data:
Parameter | Example | Description |
---|---|---|
type |
http |
Use HTTP for output. |
url |
https://example.com/datasets/scored.csv |
An absolute URL that designates where the file is written. |
The URL can optionally contain a username and password such as: https://username:password@example.com/datasets/scoring.csv
.
The http
adapter can be used for writing data to pre-signed URLs from either S3, Azure, or GCP.
JDBC write¶
DataRobot supports writing prediction results back to a JDBC data source. For this, the Batch Prediction API integrates with external data sources using securely stored credentials.
Supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings
) as described in the table below.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
jdbc |
Use a JDBC data store as output. |
Data connection options | |||
+ Select connection | dataStoreId |
5e4bc5b35e6e763beb9db14a |
The external data source ID. |
Enter credentials | credentialId |
5e4bc5555e6e763beb9db147 |
(Optional) The ID of a stored credential. Refer to storing credentials securely. |
Schemas | schema |
public |
(Optional) The name of the schema where scored data will be written. |
Tables | table |
scoring_data |
The name of the database table where scored data will be written. |
Database | catalog |
output_data |
(Optional) The name of the specified database catalog to write output data to. |
Write strategy options | |||
Write strategy | statementType |
update |
The statement type, insert , update , or insertUpdate . |
Create table if it does not exist (for Insert or Insert + Update) |
create_table_if_not_exists |
true |
(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter. |
Row identifier (for Update or Insert + Update) |
updateColumns |
['index'] |
(Optional) A list of strings containing the column names to be updated when statementType is set to update or insertUpdate . |
Row identifier (for Update or Insert + Update) |
where_columns |
['refId'] |
(Optional) A list of strings containing the column names to be selected when statementType is set to update or insertUpdate . |
Advanced options | |||
Commit interval | commitInterval |
600 |
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0 , the batch prediction operation will write the entire job before committing. Default: 600 |
Note
If your target database doesn't support the column naming conventions of DataRobot's output format, you can use Column Name Remapping to re-write the output column names to a format your target database supports (e.g., remove spaces from the name).
Statement types¶
When dealing with Write strategy options, you can use the following statement types to write data, depending on the situation:
Statement type | Description |
---|---|
insert |
Scored data rows are inserted in the target database as a new entry. Suitable for writing to an empty table. |
update |
Scored data entries in the target database matching the row identifier of a result row are updated with the new result (columns identified in updateColumns ). Suitable for writing to an existing table. |
insertUpdate |
Entries in the target database matching the row identifier of a result row (where_columns ) are updated with the new result (update queries). All other result rows are inserted as new entries (insert queries). |
createTable (deprecated) |
DataRobot no longer recommends createTable . Use a different option with create_table_if_not_exists set to True . If used, scored data rows are saved to a new table using INSERT queries. The table must not exist before scoring. |
Allowed source IP addresses¶
Any connection initiated from DataRobot originates from one of the following IP addresses:
Host: https://app.datarobot.com | Host: https://app.eu.datarobot.com | Host: https://app.jp.datarobot.com |
---|---|---|
100.26.66.209 | 18.200.151.211 | 52.199.145.51 |
54.204.171.181 | 18.200.151.56 | 52.198.240.166 |
54.145.89.18 | 18.200.151.43 | 52.197.6.249 |
54.147.212.247 | 54.78.199.18 | |
18.235.157.68 | 54.78.189.139 | |
3.211.11.187 | 54.78.199.173 | |
52.1.228.155 | 18.200.127.104 | |
3.224.51.250 | 34.247.41.18 | |
44.208.234.185 | 99.80.243.135 | |
3.214.131.132 | 63.34.68.62 | |
3.89.169.252 | 34.246.241.45 | |
3.220.7.239 | 52.48.20.136 | |
52.44.188.255 | ||
3.217.246.191 |
Note
These IP addresses are reserved for DataRobot use only.
SAP Datasphere write¶
Premium
Support for SAP Datasphere is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.
Feature flag(s): Enable SAP Datasphere Connector, Enable SAP Datasphere Batch Predictions Integration
To use SAP Datasphere, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings
) as described in the table below.
UI field | Parameter | Example | Description |
---|---|---|---|
Source type | type |
datasphere |
Use a SAP Datasphere database for output. |
Data connection options | |||
+ Select connection | dataStoreId |
5e4bc5b35e6e763beb9db14a |
The ID of an external data source. In the UI, select a data connection or click add a new data connection. Refer to the SAP Datasphere connection documentation. |
Enter credentials | credentialId |
5e4bc5555e6e763beb9db147 |
The ID of a stored credential for Datasphere. Refer to storing credentials securely. |
catalog |
/ |
The name of the database catalog containing the table to write to. | |
Schemas | schema |
public |
The name of the database schema containing the table to write to. |
Tables | table |
scoring_data |
The name of the database table containing data to write to. In the UI, select a table or click Create a table. |
Azure Blob Storage write¶
Azure Blob Storage is an option for writing large files. To save a dataset to Azure Blob Storage, you must set up a credential with DataRobot consisting of an Azure Connection String.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
azure |
Use Azure Blob Storage for output. |
URL | url |
https://myaccount.blob.core.windows.net/datasets/scored.csv |
An absolute URL for the file to be written. |
Format | format |
csv |
(Optional) Select CSV (csv ) or Parquet (parquet ). Default value: CSV |
+ Add credentials | credentialId | 5e4bc5555e6e763beb488dba |
In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access to credentials for this URL are necessary (optional otherwise). Refer to storing credentials securely. |
Azure credentials are encrypted and only decrypted when used to set up the client for communication with Azure during scoring.
Google Cloud Storage write¶
DataRobot supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage, you must set up a credential with DataRobot consisting of a JSON-formatted account key.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
gcp |
Use Google Cloud Storage for output. |
URL | url |
gcs://bucket-name/datasets/scored.csv |
An absolute URL designating where the file is written. |
Format | format |
csv |
(Optional) Select CSV (csv ) or Parquet (parquet ). Default value: CSV |
+ Add credentials | credentialId |
5e4bc5555e6e763beb488dba |
Required if explicit access credentials for this URL are required, otherwise (Optional) Refer to storing credentials securely. |
GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.
Amazon S3 write¶
DataRobot can save scored data to both public and private buckets. To write to S3, you must set up a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
s3 |
Use S3 for output. |
URL | url |
s3://bucket-name/results/scored.csv |
An absolute URL for the file to be written. DataRobot only supports directory scoring when scoring from cloud to cloud. Provide a directory in S3 (or another cloud provider) for the input and a directory ending with / for the output. Using this configuration, all files in the input directory are scored and the results are written to the output directory with the original filenames. When a single file is specified for both the input and the output, the file is overwritten each time the job runs. If you do not wish to overwrite the file, specify a filename template such as s3://bucket-name/results/scored_{{ current_run_time }}.csv . You can review template variable definitions in the documentation. |
Format | format |
csv |
(Optional) Select CSV (csv ) or Parquet (parquet ). Default value: CSV |
+ Add credentials | credentialId |
5e4bc5555e6e763beb9db147 |
In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access credentials for this URL are required. Refer to storing credentials securely. |
Advanced options | |||
Endpoint URL | endpointUrl |
https://s3.us-east-1.amazonaws.com |
(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. |
AWS credentials are encrypted and only decrypted when used to set up the client for communication with AWS during scoring.
Note
If running a Private AI Cloud within AWS, you can provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS article, Create an IAM Instance Profile.
BigQuery write¶
To use BigQuery, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings
) as described in the table below.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
bigquery |
Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table. |
Dataset | dataset |
my_dataset |
The BigQuery dataset to use. |
Table | table |
my_table |
The BigQuery table from the dataset to use for output. |
Bucket name | bucket |
my-bucket-in-gcs |
The GCP bucket where data files are stored to be loaded into or unloaded from a BiqQuery table. |
+ Add credentials | credentialId |
5e4bc5555e6e763beb488dba |
Required if explicit access credentials for this bucket are necessary (otherwise optional). In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely. |
BigQuery output write strategy
The write strategy for BigQuery output is insert
. First, the output adapter checks if a BigQuery table exists. If a table exists, the data is inserted. If a table doesn't exist, a table is created and then the data is inserted.
Refer to the example section for a complete API example.
Snowflake write¶
To use Snowflake, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings
) as described in the table below.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
snowflake |
Adapter type. |
Data connection options | |||
+ Select connection | dataStoreId |
5e4bc5b35e6e763beb9db14a |
ID of Snowflake data source. |
Enter credentials | credentialId |
5e4bc5555e6e763beb9db147 |
(Optional) The ID of a stored credential for Snowflake. |
Tables | table |
RESULTS |
Name of the Snowflake table to store results. |
Schemas | schema |
PUBLIC |
(Optional) The name of the schema containing the table to be scored. |
Database | catalog |
OUTPUT |
(Optional) The name of the specified database catalog to write output data to. |
Use external stage options | |||
Cloud storage type | cloudStorageType |
s3 |
(Optional) Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3 /azure /gcp . The default is s3 . In the UI, select Use external stage to enable the Cloud storage type field. |
External stage | externalStage |
my_s3_stage |
Snowflake external stage. In the UI, select Use external stage to enable the External stage field. |
Endpoint URL (for S3 only) | endpointUrl |
https://www.example.com/datasets/ |
(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. In the UI, for the S3 option in Cloud storage type click Show advanced options to reveal the Endpoint URL field. |
+ Add credentials | cloudStorageCredentialId |
6e4bc5541e6e763beb9db15c |
(Optional) ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage. In the UI, enable the + Add credentials field by selecting This URL requires credentials. |
Write strategy options (for fallback JDBC connection) | |||
Write strategy | statementType |
insert |
If you're using a Snowflake external stage the statementType is insert . However, in the UI you have two configuration options:
|
Create table if it does not exist (for Insert) |
create_table_if_not_exists |
true |
(Optional) If no existing table is detected, attempt to create one. |
Advanced options | |||
Commit interval | commitInterval |
600 |
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0 , the batch prediction operation will write the entire job before committing. Default: 600 |
Refer to the example section for a complete API example.
Azure Synapse write¶
To use Azure Synapse, supply data destination details using the Predictions > Job Definitions tab or the Batch Prediction API (outputSettings
) as described in the table below.
UI field | Parameter | Example | Description |
---|---|---|---|
Destination type | type |
synapse |
Adapter type. |
Data connection options | |||
+ Select connection | dataStoreId |
5e4bc5b35e6e763beb9db14a |
ID of Synapse data source. |
Enter credentials | credentialId |
5e4bc5555e6e763beb9db147 |
(Optional) The ID of a stored credential for Synapse. |
Tables | table |
RESULTS |
Name of the Synapse table to keep results in. |
Schemas | schema |
dbo |
(Optional) Name of the schema containing the table to be scored. |
Use external stage options | |||
External data source | externalDatasource |
my_data_source |
Name of the identifier created in Synapse for the external data source. |
+ Add credentials | cloudStorageCredentialId |
6e4bc5541e6e763beb9db15c |
(Optional) ID of a stored credential for Azure Blob storage. |
Write strategy options (for fallback JDBC connection) | |||
Write strategy | statementType |
insert |
If you're using a Synapse external stage the statementType is insert . However, in the UI you have two configuration options:
|
Create table if it does not exist (for Insert or Insert + Update) |
create_table_if_not_exists |
true |
(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter. |
Create table if it does not exist | create_table_if_not_exists |
true |
(Optional) Attempt to create the table first if no existing one is detected. |
Advanced options | |||
Commit interval | commitInterval |
600 |
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0 , the batch prediction operation will write the entire job before committing. Default: 600 |
Refer to the example section for a complete API example.