Batch Scoring Script¶
The Python batch scoring script has been deprecated and replaced with the Batch Prediction Scripts. While the script can still function in some environments, because legacy Prediction API routes on the prediction servers in the managed AI Platform are disabled, some commands won't work.
The Python batch scoring script is designed to efficiently score large files using the Prediction API. The batch-scoring script only runs against dedicated prediction workers (for managed AI Platform deployments) or a dedicated prediction cluster (for Self-Managed AI Platform users). It achieves greater speed by splitting a CSV input file into optimally sized batches and submitting these concurrently to the prediction server. Batches can be score much more quickly than individual rows. The script handles queueing, resource management, and concurrent request management, requiring no user intervention. Concurrent requests greatly increase the efficiency of the process by using multiple processors to make predictions. However, you should not use a value for
<n>_concurrency greater than your number of prediction cores. Consult DataRobot Support if you are unsure of how many cores you have.
There is a known bug in Python 2.7.8 and later 2.7.x versions that causes SSL connections to fail and so they are not supported. This script supports Python 2.7.7, but Python 3.4 and later are recommended for better speed and text decoding. You can use Anaconda 2.2.0 or later to install the
datarobot_batch_scoring script. If you do not have access to the Internet for downloading dependencies, DataRobot Support can provide a bundle that includes everything needed to install offline.
Download and install the DataRobot batch scoring package for Python 2 and 3 using the following command:
pip install -U datarobot_batch_scoring
Alternative install methods¶
DataRobot provides two alternative install methods on the project releases page. (Log in to GitHub before clicking this link.) These can help when you do not have:
- Internet access
- administrative privileges
- the Python package manager (pip) installed
- the correct version of Python installed (use
PyInstaller, option 2 below, only)
In any of the above situations, use:
offlinebundle: For performing installations in environments where Python2.7 or Python3+ is available. Works on Linux, OSX, or Windows. These files have "offlinebundle" in their name on the release page. The install directions are included in the zip or tar file.
PyInstaller: Using PyInstaller, DataRobot builds a single-file executable that does not depend on Python. It can be installed without administrative privileges. These files on the release page have "executables" in their name, as well as the version and platform (Linux, Windows, or OSX). The install directions are included in the zip or tar file. Note that the PyInstaller builds for Linux work on distros equal to or newer than Centos 5. Contact DataRobot Support if you have questions or if you have problems getting a build to work on your system.
Syntax, examples, and usage notes¶
For complete and up-to-date scoring script syntax and information, visit the DataRobot batch-scoring Github page. Log in to GitHub before clicking this link.
--verbose output of the script provide information about the progress of the scoring procedure, as shown in the following example. Some particularly informative sections are described below the image.
--host="https://datarobot-xxxxx.datarobot.com": Hostname of the prediction API endpoint (the location of the data to use for predictions
'user': 'firstname.lastname@example.org', 'api_token': 'ABCD1234XYZ7890', ... 'datarobot_key': 'xxxxxxxxxxxxxxxxx', ... 'deployment_id': 'yyyyyyyyyyyyyyyyyy': User name and corresponding API key, DataRobot key, and deployment ID
batch_scoring v1.16.4: Script name and version number
- Multiple checks of encoding and dialect with response timing
Authorization has succeeded: Verification that login credentials are valid
MainProcess [WARNING] File output.csv exists. Do you want to remove output.csv (Yes/No)> y: Notification that a file already exists with the specified output name
1 responses sent | time elapsed 0.545090913773s: Time to score submission