Scoring at the command line¶
The following sections provide syntax for scoring at the command line.
Command line options¶
Option | Required / Default | Description |
---|---|---|
--help |
No Default: Disabled |
Prints all of the available options as well as some model metadata. |
--input=<value> |
Yes Default: None |
Defines the source of the input data. Valid values are:
|
--output=<value> |
Yes Default: None |
Sets the way to output results. Valid values are:
|
--encoding=<value> |
No Default: Default system encoding |
Sets the charset encoding used to read file content. Use one of the canonical names for java.io API and java.lang API . If the option is not set, the tool will be able to detect UTF8 and UTF16 BOM. |
--delimiter=<value> |
No Default: , (comma) |
Specifies the delimiter symbol used in CSV files to split values between columns.Note: use the option --delimiter=“;” to use the semicolon ; as a delimiter (; is a reserved symbol in bash/shell). |
--passthrough_columns |
No Default: None |
Sets the input columns to include in the results file. For example, if the flag contains a set of columns (e.g., column1,column2 ), the output will contain predictive column(s) and the columns 1 and 2 only. To include all original columns, use All . The resulting file will contain columns in the same order, and will use the same format and the same value as the delimiters parameter. If this parameter is not specified, the command only returns the prediction column(s). |
--chunk_size=<value> |
No Default: min(1MB, {file_size}/{cores_number}) |
"Slices" the initial dataset into chunks to score in a sequence as separate asynchronous tasks. In most cases, the default value will produce the best performance. Bigger chunks can be used to score very fast models and smaller chunks can be used to score very slow models. |
--workers_number=<value> |
No Default: Number of logical cores |
Specifies the number of workers that can process chunks of work concurrently. By default, the value will match the number of logical cores and will produce the best performance. |
--log_level=<value> |
No Default: INFO |
Sets the level of information to be output to the console. Available options are INFO , DEBUG , and TRACE . |
--pred_name=<value> |
No Default: DR_Score |
For regression projects, this field sets the name of the prediction column in the output file. In classification projects, the prediction labels are the same as the class labels. |
--buffer_size=<value> |
No Default: 1000 |
Controls the size of the asynchronous task queue. Set it to a smaller value if you are experiencing OutOfMemoryException errors while using this tool. This is an advanced parameter. |
--config=<value> |
No Default: The .jar file directory |
Sets the location for the batch.properties file, which writes all config parameters to a single file. If you place it in the same directory as the .jar, you do not need to set this parameter. If you want to place batch.properties into another directory, you need to set the value of the parameter to be the path to the target directory. |
--with_explanations |
No Default: Disabled |
Turns on prediction explanation computations. |
--max_codes=<value> |
No Default: 3 |
Sets the maximum number of explanations to compute. |
--threshold_low=<value> |
No Default: Null |
Sets the low threshold for prediction rows to be included in the explanations. |
--threshold_high=<value> |
No Default: Null |
Sets the high threshold for prediction rows to be included in the explanations. |
--enable_mlops |
No Default: Enabled |
Initializes an MLOps instance for tracking scores. |
--dr_token=<value> |
Yes if --enabled_mlops is set. Default: None |
Specifies the authorization token for monitoring agent requests. |
--disable_agent |
No Default: Enabled |
When --enable_mlops is enabled, sets whether to allow offline tracking. |
Time series options | ||
--forecast_point=<value> |
No Default: None |
Formatted date from which to forecast. |
--date_format=<value> |
No Default: None |
Date format to use for output. |
--predictions_start_date=<value> |
No Default: None |
Timestamp that indicates when to start calculating predictions. |
--predictions_end_date=<value> |
No Default: None |
Timestamp that indicates when to stop calculating predictions. |
--with_intervals |
No Default: None |
Turns on prediction interval calculations. |
--interval_length=<value> |
No Default: None |
Interval length as int value from 1 to 99 . |
--time_series_batch_processing |
No Default: Disabled |
Enables performance-optimized batch processing for time series models. |
Note
For more information, see Scoring Code usage examples.
Batch properties file¶
You can configure the batch.properties
file to change the default values for the command line options above, allowing you to simplify the command line scoring process, as having too many options for a bash command can make it difficult to read. In addition, some command line options depend on your scoring environment, leading to duplicate options for some commands; to avoid these duplications, you can save those parameters to the batch.properties
file and reuse them.
The following properties are available in the batch.properties
file, mapping to the listed command line option:
Batch property | Option mapping |
---|---|
com.datarobot.predictions.batch.encoding |
--encoding |
com.datarobot.predictions.batch.passthrough.columns |
--passthrough_columns |
com.datarobot.predictions.batch.chunk.size=150 |
--chunk_size |
com.datarobot.predictions.batch.workers.number= |
--workers_number |
com.datarobot.predictions.batch.log.level=INFO |
--log_level |
com.datarobot.predictions.batch.pred.name=PREDICTION |
--pred_name |
com.datarobot.predictions.batch.buffer.size=1000 |
--buffer_size |
com.datarobot.predictions.batch.enable.mlops=false |
--enable_mlops |
com.datarobot.predictions.batch.disable.agent |
--disable_agent |
com.datarobot.predictions.batch.max.file.size=1000000000 |
No option mapping To read and write to and from the same file, this property sets the maximum original file size, allowing the command line interface to read it all in memory before scoring. |
Time series parameters | |
com.datarobot.predictions.batch.forecast.point= |
--forecast_point |
com.datarobot.predictions.batch.date.format=yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z' |
--date_format |
com.datarobot.predictions.batch.start.timestamp= |
--predictions_start_date |
com.datarobot.predictions.batch.end.timestamp= |
--predictions_end_date |
com.datarobot.predictions.batch.with.interval |
--with_intervals |
com.datarobot.predictions.batch.interval_length |
--interval_length |
com.datarobot.predictions.batch.time.series.batch.proccessing |
--time_series_batch_processing |
Increase Java heap memory¶
Depending on the model's binary size, you may have to increase the Java virtual machine (JVM) heap memory size. When scoring your model, if you receive an OutOfMemoryError: Java heap space error
message, increase your Java heap size by calling java -Xmx1024m
and adjusting the number as necessary to allocate sufficient memory for the process.
To guarantee, in case of error, scoring result consistency and a non-zero exit code, run the application with the -XX:+ExitOnOutOfMemoryError
flag.
The following example increases heap memory to 2GB:
java -XX:+ExitOnOutOfMemoryError -Xmx2g -Dlog4j2.formatMsgNoLookups=true -jar 5cd071deef881f011a334c2f.jar csv --input=Iris.csv --output=Isis_out.csv