JAR structure¶
Once you have downloaded the Scoring Code JAR package to your machine, you'll see that it has a well-organized structure:
Root directory¶
The root directory contains a set of .so
and .jnilib
files. These contain compiled Java Native Interface code for LAPACK and BLAS libraries. When a JAR is launched, it first attempts to locate these libraries in the OS. If located, model scoring is greatly speeded up. If the libraries are not located, Scoring Code falls back to a slower Java implementation.
com.github.fommil package¶
The com.github.fommil
package contains the Java-side of LAPACK and BLAS native interfaces.
drmodel_ID package¶
The drmodel_ID
package contains a set of binary files with parameters for individual nodes of a DataRobot model (blueprint). While these parameters are not human-readable, you can still get their values by debugging readParameters(DRDataInputStream dis)
methods inside of classes that implement nodes of the model. These classes are located inside of the om.datarobot.prediction.dr<model_ID>
package.
com.datarobot.prediction package¶
The com.datarobot.prediction
package contains commonly used Java interfaces inside of a Scoring Code JAR. To maintain backward compatibility, it contains both current and deprecated versions of the interfaces. The deprecated interfaces are Predictor, MulticlassPredictor, and Row.
com.datarobot.prediction.dr package¶
Thecom.datarobot.prediction.dr<model_ID>
package contains the classes that implement the model (blueprint) as well as some utility code.
To understand the model, start with the BP.java
class. This class manages data flow through the model. The raw data comes into the DP.java
class where feature conversion and transformation operations take place. Then, the preprocessed data goes into each one of V<number>
classes where actual steps of model execution take place. All of these classes use three main utility classes:
-
BaseDataStructure
defines a unified container for data. -
DRDataInputStream
reads binary parameters from the packagedr<model_ID>
. -
BaseVertex
contains actual implementations of machine learning algorithms and utility functions. -
DRModel
defines the low-level implementation of a model API. The classesRegressionPredictorImpl
andClassificationPredictorImpl
are top-level APIs built on top ofDRModel
. It is highly recommended that you use these classes instead of usingDRModel
directly. More information about these interfaces can be found in the javadoc (linked from the Downloads tab) and in the section Backward-compatible Java API.
com.datarobot.prediction.drmatrix package¶
The com.datarobot.prediction.drmatrix
package contains implementations of common matrix operations on dense and sparse matrices.
com.datarobot.prediction.engine and com.datarobot.prediction.io packages¶
The com.datarobot.prediction.engine
and com.datarobot.prediction.io
packages contain high-performance scoring logic that enables each Scoring Code JAR to be used as a command line scoring tool for CSV files.
Differences between source and binary JARs¶
The following table describes the differences between the source and binary download options.
Files | Binary .jar |
Source .jar |
---|---|---|
Native .so and jnilib files for BLAS and LAPAC libraries |
Yes | No |
com.github.fommil for BLAS and LAPAC libraries |
Yes | No |
dr<model_ID> (binary parameters for nodes of the model) |
Yes | Yes |
com.datarobot.prediction |
Yes | No |
com.datarobot.prediction.drmodel_ID |
Yes | Yes |
com.datarobot.prediction.drmatrix |
Yes | No |
com.datarobot.prediction.engine |
Yes | No |
com.datarobot.prediction.io |
Yes | No |
DataRobot provides “source” .jar files for downloading to simplify the process of model inspection. By using the “source” download option, you get only the code that directly implements the model. It is the same code as the “binary” .jar, but stripped of all of the dependencies.