Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Scoring at the command line

The following sections provide syntax for both current and backward-compatible scoring at the command line.

Scoring with the embedded CLI

Following is complete syntax for using the binary scoring code JAR to score a CSV file:

 $ java -jar .jar csv --input= --output= [--help] [--encoding=] [--delimiter=]
 [--passthrough_columns=] [--chunk_size=] [--workers_number=] [--log_level=]
 [--fail_fast] [--pred_name=] [--timeout=] [--buffer_size=] [--model_id=]
 [--config=] [--with_explanations] [--max_codes=<value>]
 [--threshold_low=<value>] [--threshold_high=<value>] [--enable_mlops]
 [--dr_token=<value>] [--disable_agent]

For example:

$ java -jar 5cd071deef881f011a334c2f.jar csv --input=Iris.csv --output=Isis_out.csv

Returns:

$ head Iris_out.csv
Iris-setosa,Iris-virginica,Iris-versicolor
0.9996371740832738,1.8977798830979584E-4,1.7304792841625776E-4
0.9996352462865297,1.9170611877686303E-4,1.730475946939417E-4
0.9996373523223016,1.8970270284380858E-4,1.729449748545291E-4

Backward-compatible external tool

Scoring Code models generated by DataRobot prior to version 5.2 do not have an embedded command line interface. However, you can use an external command line tool score CSV files.

Note

To download the external tool, you must download the JAR with dependencies. To do so, select the version of the tool in the MVN repository and click on Files -> View All. Download the file scoring-code-standalone-tool-version-jar-with-dependencies.jar.

Once the external command line tool is downloaded, score a .csv file using the following template command:

$ java -cp scoring-code-standalone-tool.jar:<scoring code jar file name>.jar
com.datarobot.prediction.Main csv --model_id=<model id> --input=<input file>
--output=<output file> [--help] [--encoding=<value>] [--delimiter=<value>]
[--passthrough_columns=<value>] [--chunk_size=<value>]
[--workers_number=<value>] [--log_level=<value>] [--fail_fast]
[--pred_name=<value>] [--timeout=<value>] [--buffer_size=<value>]
[--model_id=<value>] [--config=<value>] [--with_explanations]
[--max_codes=<value>] [--threshold_low=<value>] [--threshold_high=<value>]
[--enable_mlops] [--dr_token=<value>] [--disable_agent]

For <model id> , specify the name of the downloaded Scoring Code JAR. For example, the model id of 5d2db404dbaff900441cbc7c.jar is 5d2db404dbaff900441cbc7c.

Increasing Java heap memory

Depending on the model's binary size, you may have to increase the Java virtual machine (JVM) heap memory size. When scoring your model, if you receive an OutOfMemoryError: Java heap space error message, increase your Java heap size by calling java -Xmx1024m and adjusting the number as necessary to allocate sufficient memory for the process. To guarantee, in case of error, scoring result consistency and a non-zero exit code, run the application with the -XX:+ExitOnOutOfMemoryError flag.

The following example increases heap memory to 2GB:

$ java -XX:+ExitOnOutOfMemoryError -Xmx2g -jar 5cd071deef881f011a334c2f.jar csv --input=Iris.csv --output=Isis_out.csv

Command line parameters

Field Required? Default Description
--input={value} Yes None Defines the source of the input data. Valid values are:
  • --input=- to set the input from standard input
  • --input=</path/to/input/csv>/input.csv to set the source of the data.
--output={value} Yes None Sets the way to output results. Valid values are:
  • --output=- to write the results to standard output
  • --output=/path/to/output/csv/output.csv to save results to a file. The output file always contains the same number of rows as the original file and they are always in the same order. Note that for files smaller than 1GB, you can specify the output file to be the same as the input file, causing it to replace the input with the scored file.
--encoding={value} No Default system encoding Sets the charset encoding used to read file content. Use one of the canonical names for java.io API and java.lang API. If the option is not set, the tool will be able to detect UTF8 and UTF16 BOM.
--delimiter={value} No , (comma) Specifies the delimiter symbol used in .csv files to split values between columns.Note: use the option --delimiter=“;” to use the semicolon ; as a delimiter (; is a reserved symbol in bash/shell).
--passthrough_columns No None Sets the input columns to include in the results file. For example, if the flag contains a set of columns (e.g., column1,column2), the output will contain predictive column(s) and the columns 1 and 2 only. To include all original columns, use All. The resulting file will contain columns in the same order, and will use the same format and the same value as the delimiters parameter. If this parameter is not specified, the command only returns the prediction column(s).
--chunk_size={value} No min(1MB, {file_size}/{cores_number}) "Slices" the initial dataset into chunks to score in a sequence as separate asynchronous tasks. In most cases, the default value will produce the best performance. Bigger chunks can be used to score very fast models and smaller chunks can be used to score very slow models.
--workers_number={value} No Number of logical cores Specifies the number of workers that can process chunks of work concurrently. By default, the value will match the number of logical cores and will produce the best performance.
--help No Disabled Prints all of the available options as well as some model metadata.
--log_level={value} No INFO Sets the level of information to be output to the console. Available options are INFO, DEBUG, and TRACE.
--fail_fast No Disabled Sets error handling.
  • If not set, the tool will output an empty string in place of the predictions if a row does not contain the correct number of fields or the type of one of the fields does not match the type expected by the model.
  • If set, the tool will exit on the first error and return 1 as a return code. In all cases, the tool will output all errors into standard error output (console). The --fail_fast option does not guarantee a non-zero exit code in the case of an OutOfMemoryError. If this is required, use the -XX:+ExitOnOutOfMemoryError flag.
--pred_name={value} No DR_Score For regression projects, sets the name of the prediction column in the output file. In classification projects, the prediction labels are the same as the class labels.
--buffer_size={value} No 1000 Controls the size of the asynchronous task queue. Set it to a smaller value if you are experiencing OutOfMemoryException errors while using this tool. This is an advanced parameter.
--config={value} No .jar file directory Sets the location for the batch.properties file, which writes all config parameters to a single file. If you place it in the same directory as the .jar, you do not need to set this parameter. If you want to place batch.properties into another directory, you need to set the value of the parameter to be the path to the target directory.
--with_explanations No Disabled Turns on prediction explanation computations.
--max_codes={value} No 3 Sets the maximum number of explanations to compute.
--threshold_low={value} No Null Sets the low threshold for prediction rows to be included in the explanations.
--threshold_high={value} No Null Sets the high threshold for prediction rows to be included in the explanations.
--enable_mlops No Enabled Initializes an MLOps instance for tracking scores.
--dr_token={value} Required if --enabled_mlops is set. None Specifies the authorization token for tracking agent requests.
--disable_agent No Enabled When --enable_mlops is enabled, sets whether to allow offline tracking.

Updated October 27, 2021
Back to top