Write prediction results back to a SAP Datasphere data source with data destination details supplied through a job definition or the Batch Prediction API.
Score data using Synapse with data destination details supplied through a job definition or the Batch Prediction API.
If you are using a custom CSV format, any output option dealing with CSV will adhere to that format. The columns that appear in the output are documented in the section on output format.
If your job is configured with local file streaming as the output option, you can start downloading the scored data as soon as the job moves to a RUNNING state. In the example job data JSON below, the URL needed to make the local file streaming request is available in the download key of the links object:
{"elapsedTimeSec":97,"failedRows":0,"jobIntakeSize":1150602342,"jobOutputSize":107791140,"jobSpec":{"deploymentId":"5dc1a6a9865d6c004dd881ef","maxExplanations":0,"numConcurrent":4,"passthroughColumns":null,"passthroughColumnsSet":null,"predictionWarningEnabled":null,"thresholdHigh":null,"thresholdLow":null},"links":{"download":"https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/download/","self":"https://app.datarobot.com/api/v2/batchPredictions/5dc45e583c36a100e45276da/"},"logs":["Job created by user@example.org from 203.0.113.42 at 2019-11-07 18:11:36.870000","Job started processing at 2019-11-07 18:11:49.781000","Job done processing at 2019-11-07 18:13:14.533000"],"percentageCompleted":0.0,"scoredRows":3000000,"status":"COMPLETED","statusDetails":"Job done processing at 2019-11-07 18:13:14.533000"}
If you download faster than DataRobot can ingest and score your data, the download may appear sluggish because DataRobot streams the scored data as soon as it arrives (in chunks).
(Optional) The name of the schema where scored data will be written.
Tables
table
scoring_data
The name of the database table where scored data will be written.
Database
catalog
output_data
(Optional) The name of the specified database catalog to write output data to.
Write strategy options
Write strategy
statementType
update
The statement type, insert, update, or insertUpdate.
Create table if it does not exist (for Insert or Insert + Update)
create_table_if_not_exists
true
(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Row identifier (for Update or Insert + Update)
updateColumns
['index']
(Optional) A list of strings containing the column names to be updated when statementType is set to update or insertUpdate.
Row identifier (for Update or Insert + Update)
where_columns
['refId']
(Optional) A list of strings containing the column names to be selected when statementType is set to update or insertUpdate.
Advanced options
Commit interval
commitInterval
600
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600
Note
If your target database doesn't support the column naming conventions of DataRobot's output format, you can use Column Name Remapping to re-write the output column names to a format your target database supports (e.g., remove spaces from the name).
When dealing with Write strategy options, you can use the following statement types to write data, depending on the situation:
Statement type
Description
insert
Scored data rows are inserted in the target database as a new entry. Suitable for writing to an empty table.
update
Scored data entries in the target database matching the row identifier of a result row are updated with the new result (columns identified in updateColumns). Suitable for writing to an existing table.
insertUpdate
Entries in the target database matching the row identifier of a result row (where_columns) are updated with the new result (update queries). All other result rows are inserted as new entries (insert queries).
createTable (deprecated)
DataRobot no longer recommends createTable. Use a different option with create_table_if_not_exists set to True. If used, scored data rows are saved to a new table using INSERT queries. The table must not exist before scoring.
Azure Blob Storage is an option for writing large files. To save a dataset to Azure Blob Storage, you must set up a credential with DataRobot consisting of an Azure Connection String.
(Optional) Select CSV (csv) or Parquet (parquet). Default value: CSV
+ Add credentials
credentialId
5e4bc5555e6e763beb488dba
In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access to credentials for this URL are necessary (optional otherwise). Refer to storing credentials securely.
Azure credentials are encrypted and only decrypted when used to set up the client for communication with Azure during scoring.
DataRobot supports the Google Cloud Storage adapter. To save a dataset to Google Cloud Storage, you must set up a credential with DataRobot consisting of a JSON-formatted account key.
UI field
Parameter
Example
Description
Destination type
type
gcp
Use Google Cloud Storage for output.
URL
url
gcs://bucket-name/datasets/scored.csv
An absolute URL designating where the file is written.
Format
format
csv
(Optional) Select CSV (csv) or Parquet (parquet). Default value: CSV
+ Add credentials
credentialId
5e4bc5555e6e763beb488dba
Required if explicit access credentials for this URL are required, otherwise (Optional) Refer to storing credentials securely.
GCP credentials are encrypted and are only decrypted when used to set up the client for communication with GCP during scoring.
DataRobot can save scored data to both public and private buckets. To write to S3, you must set up a credential with DataRobot consisting of an access key (ID and key) and optionally a session token.
UI field
Parameter
Example
Description
Destination type
type
s3
Use S3 for output.
URL
url
s3://bucket-name/results/scored.csv
An absolute URL for the file to be written. DataRobot only supports directory scoring when scoring from cloud to cloud. Provide a directory in S3 (or another cloud provider) for the input and a directory ending with / for the output. Using this configuration, all files in the input directory are scored and the results are written to the output directory with the original filenames. When a single file is specified for both the input and the output, the file is overwritten each time the job runs. If you do not wish to overwrite the file, specify a filename template such as s3://bucket-name/results/scored_{{ current_run_time }}.csv. You can review template variable definitions in the documentation.
Format
format
csv
(Optional) Select CSV (csv) or Parquet (parquet). Default value: CSV
+ Add credentials
credentialId
5e4bc5555e6e763beb9db147
In the UI, enable the + Add credentials field by selecting This URL requires credentials. Required if explicit access credentials for this URL are required. Refer to storing credentials securely.
Advanced options
Endpoint URL
endpointUrl
https://s3.us-east-1.amazonaws.com
(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service.
AWS credentials are encrypted and only decrypted when used to set up the client for communication with AWS during scoring.
Note
If running a Private AI Cloud within AWS, you can provide implicit credentials for your application instances using an IAM Instance Profile to access your S3 buckets without supplying explicit credentials in the job data. For more information, see the AWS article, Create an IAM Instance Profile.
Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
Dataset
dataset
my_dataset
The BigQuery dataset to use.
Table
table
my_table
The BigQuery table from the dataset to use for output.
Bucket name
bucket
my-bucket-in-gcs
The GCP bucket where data files are stored to be loaded into or unloaded from a BiqQuery table.
+ Add credentials
credentialId
5e4bc5555e6e763beb488dba
Required if explicit access credentials for this bucket are necessary (otherwise optional). In the UI, enable the + Add credentials field by selecting This connection requires credentials. Refer to storing credentials securely.
BigQuery output write strategy
The write strategy for BigQuery output is insert. First, the output adapter checks if a BigQuery table exists. If a table exists, the data is inserted. If a table doesn't exist, a table is created and then the data is inserted.
(Optional) The ID of a stored credential for Snowflake.
Tables
table
RESULTS
Name of the Snowflake table to store results.
Schemas
schema
PUBLIC
(Optional) The name of the schema containing the table to be scored.
Database
catalog
OUTPUT
(Optional) The name of the specified database catalog to write output data to.
Use external stage options
Cloud storage type
cloudStorageType
s3
(Optional) Type of cloud storage backend used in Snowflake external stage. Can be one of 3 cloud storage providers: s3/azure/gcp. The default is s3. In the UI, select Use external stage to enable the Cloud storage type field.
External stage
externalStage
my_s3_stage
Snowflake external stage. In the UI, select Use external stage to enable the External stage field.
Endpoint URL (for S3 only)
endpointUrl
https://www.example.com/datasets/
(Optional) Override the endpoint used to connect to S3, for example, to use an API gateway or another S3-compatible storage service. In the UI, for the S3 option in Cloud storage type click Show advanced options to reveal the Endpoint URL field.
+ Add credentials
cloudStorageCredentialId
6e4bc5541e6e763beb9db15c
(Optional) ID of stored credentials for a storage backend (S3/Azure/GCS) used in Snowflake stage. In the UI, enable the + Add credentials field by selecting This URL requires credentials.
If you're using a Snowflake external stage the statementType is insert. However, in the UI you have two configuration options:
If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert or Update. If you select Update, you can provide a Row identifier.
If you selected Use external stage, the Insert option is required.
Create table if it does not exist (for Insert)
create_table_if_not_exists
true
(Optional) If no existing table is detected, attempt to create one.
Advanced options
Commit interval
commitInterval
600
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600
If you're using a Synapse external stage the statementType is insert. However, in the UI you have two configuration options:
If you haven't configured an external stage, the connection defaults to JDBC and you can select Insert, Update, or Insert + Update. If you select Update or Insert + Update, you can provide a Row identifier.
If you selected Use external stage, the Insert option is required.
Create table if it does not exist (for Insert or Insert + Update)
create_table_if_not_exists
true
(Optional) If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
Create table if it does not exist
create_table_if_not_exists
true
(Optional) Attempt to create the table first if no existing one is detected.
Advanced options
Commit interval
commitInterval
600
(Optional) Defines a time interval, in seconds, between commits to the JDBC source. If set to 0, the batch prediction operation will write the entire job before committing. Default: 600