The Portable Prediction Server is a premium feature exclusive to DatRobot MLOps. Contact your DataRobot representative or administrator for information on enabling this feature.
Portable batch predictions (PBP) let you score large amounts of data on disconnected environments. Before you can use portable batch predictions, you need to configure the Portable Prediction Server (PPS), a DataRobot execution environment for DataRobot model packages (.mlpkg files) distributed as a self-contained Docker image. Portable batch predictions use the same Docker image as the PPS but run it in a different mode.
You can define jobs using a JSON config file in which you describe prediction_endpoint, intake_settings,
output_settings, timeseries_settings (optional) for time series scoring, and jdbc_settings (optional) for JDBC scoring.
Self-Managed AI Platform only: Prediction endpoint SSL configuration
If you need to disable SSL verification for the prediction_endpiont, you can set ALLOW_SELF_SIGNED_CERTS to True. This configuration disables SSL certificate verification for requests made by the application to the web server. This is useful if you have SSL encryption enabled on your cluster and are using certificates that are not signed by a globally trusted Certificate Authority (self-signed).
The prediction_endpoint describes how to access the PPS and is constructed as <schema>://<hostname>:<port>, where you define the following parameters:
Parameter
Type
Description
schema
string
httporhttps
hostname
string
The hostname of the instance where your PPS is running.
port
string
The port of the prediction API running inside the PPS.
The jdbc_setting has the following attributes:
Parameter
Type
Description
url
string
The URL to connect via the JDBC interface.
class_name
string
The class name used as an entry point for JDBC communication.
driver_path
string
The path to the JDBC driver on your filesystem (available inside the PBP container).
template_name
string
The name of the template in case of write-back. To obtain the names of the support templates, contact your DataRobot representative.
The other parameters are similar to those available for standard batch predictions, however, they are in snake_case, not camelCase:
Parameter
Type
Description
abort_on_error
boolean
Enable or disable cancelling the portable batch prediction job if an error occurs. Example: true
chunk_size
string
Chunk the dataset for scoring in sequence as asynchronous tasks. In most cases, the default value will produce the best performance. Bigger chunks can be used to score very fast models and smaller chunks can be used to score very slow models. Example: "auto"
column_names_remapping
array
Rename or remove columns from the output for this job. Set an output_name for the column to null or false to remove it. Example: [{'input_name': 'isbadbuy_1_PREDICTION', 'output_name':'prediction'}, {'input_name': 'isbadbuy_0_PREDICTION', 'output_name': null}]
csv_settings
object
Set the delimiter, character encoding, and quote character for comma separated value (CSV) files. Example: { "delimiter": ",", "encoding": "utf-8", "quotechar": "\"" }
deployment_id
string
Define the ID of the deployment associated with the portable batch predictions. Example: 61f05aaf5f6525f43ed79751
disable_row_level_error_handling
boolean
Enable or disable error handling by prediction row. Example: false
include_prediction_status
boolean
Enable or disable including the prediction_status column in the output; defaults to false. Example: false
include_probabilities
boolean
Enable or disable returning probabilities for all classes. Example: true
include_probabilities_classes
array
Define the classes to provide class probabilities for. Example: [ 'setosa', 'versicolor', 'virginica' ]
intake_settings
object
Set the intake options required for the input type. Example: { "type": "localFile" }
num_concurrent
integer
Set the maximum number chunks to score concurrently on the prediction instance specified by the deployment. Example: 1
output_settings
object
Set the output options required for the output type. Example: { "credential_id": "string", "format": "csv", "partitionColumns": [ "string" ], "type": "azure", "url": "string" }
passthrough_columns
array
Define the scoring dataset columns to include in the prediction response. This option is mutually exclusive with passthrough_columns_set. Example: [ "column1", "column2" ]
passthrough_columns_set
string
Enable including all scoring dataset columns in the prediction response. The only option is all. This option is mutually exclusive with passthrough_columns. Example: "all"
prediction_warning_enabled
boolean
Enable or disable prediction warnings. Example: true
skip_drift_tracking
boolean
Enable or disable drift tracking for this batch of predictions. This allows you to make test predictions without affecting deployment stats. Example: false
timeseries_settings
object
Define the settings required for time series predictions. Example: { "forecast_point": "2019-08-24T14:15:22Z", "relax_known_in_advance_features_check": false, "type": "forecast" }
Define the class names to explain for each row. This setting is only applicable to XEMP Prediction Explanations for multiclass models and it is mutually exclusive with explanation_num_top_classes. Example: [ "class1", "class2" ]
explanation_num_top_classes
integer
Set the number of top predicted classes, by prediction value, to explain for each row. This setting is only applicable to XEMP Prediction Explanations for multiclass models and it is mutually exclusive with explanation_class_names. Example: 1
threshold_low
float
Set the lower threshold for requiring a Prediction Explanation. Predictions must be below this value (or above the threshold_high value) for Prediction Explanations to compute. Example: 0.678
threshold_high
float
Set the upper threshold for requiring a Prediction Explanation. Predictions must be above this value (or below the threshold_low value) for Prediction Explanations to compute. Example: 0.345
The following outlines a JDBC example that scores to and from Snowflake using single-mode PPS running locally and can be defined as a job_definition_jdbc.json file:
If you are using JDBC or private containers in cloud storage, you can specify the required
credentials as environment variables. The following table shows which variables names are used:
Name
Type
Description
AWS_ACCESS_KEY_ID
string
AWS Access key ID
AWS_SECRET_ACCESS_KEY
string
AWS Secret access key
AWS_SESSION_TOKEN
string
AWS token
GOOGLE_STORAGE_KEYFILE_PATH
string
Path to GCP credentials file
AZURE_CONNECTION_STRING
string
Azure connection string
JDBC_USERNAME
string
Username for JDBC
JDBC_PASSWORD
string
Password for JDBC
SNOWFLAKE_USERNAME
string
Username for Snowflake
SNOWFLAKE_PASSWORD
string
Password for Snowflake
SYNAPSE_USERNAME
string
Username for Azure Synapse
SYNAPSE_PASSWORD
string
Password for Azure Synapse
Here's an example of the credentials.env file used for JDBC scoring:
Portable batch predictions run inside a Docker container. You need to mount job definitions, files, and datasets (if you are going to score from a host filesystem and set a path inside the container) onto Docker. Using a JDBC job definition and credentials from previous examples, the following outlines a complete example of how to start a portable batch predictions job to score to and from Snowflake.
Here is another example of how to run a complete end-to-end flow, including PPS and a write-back
job status into the DataRobot platform for monitoring progress.
#!/bin/bash# This snippet starts both the PPS service and PBP job using the same PPS docker image# available from Developer Tools.################## Configuration ################### Specify path to directory with mlpkg(s) which you can download from deploymentMLPKG_DIR='/host/filesystem/path/mlpkgs'# Specify job definition pathJOB_DEFINITION_PATH='/host/filesystem/path/job_definition.json'# Specify path to file with credentials if needed (for cloud storage adapters or JDBC)CREDENTIALS_PATH='/host/filesystem/path/credentials.env'# For DataRobot integration, specify API host and TokenAPI_HOST='https://app.datarobot.com'API_TOKEN='XXXXXXXX'# Run PPS service in the backgroundPPS_CONTAINER_ID=$(dockerrun--rm-d-p127.0.0.1:8080:8080-v$MLPKG_DIR:/opt/ml/modeldatarobot/datarobot-portable-prediction-api:<version>)# Wait some time before PPS starts up
sleep15# Run PPS in batch mode to start PBP job
dockerrun--rm-v$JOB_DEFINITION_PATH:/tmp/job_definition.json\--networkhost\--env-file$CREDENTIALS_PATH\datarobot/datarobot-portable-prediction-api:<version>batch/tmp/job_definition.json
--api_host$API_HOST--api_token$API_TOKEN# Stop PPS service
dockerstop$PPS_CONTAINER_ID