Scoring Code¶
Availability information
Contact your DataRobot representative for information on enabling the Scoring Code feature.
Scoring Code allows you to export DataRobot-generated models as JAR files that you can use outside of the platform. DataRobot automatically runs code generation for qualifying models and indicates code availability with a SCORING CODE indicator on the Leaderboard. You can export a model's Scoring Code from the Leaderboard or the model's deployment. The download includes a pre-compiled JAR file (with all dependencies included), as well as the source code JAR file. Once exported, you can view the model's source code to help understand each step DataRobot takes in producing your predictions.
How does DataRobot determine which models will have Scoring Code?
When the Scoring Code feature is enabled, DataRobot generates a Java alternative for each blueprint preprocessing step and compares its results on the validation set with the original results. If the difference between results is greater than 0.00001, DataRobot does not provide the option to download the Scoring Code. In this way, DataRobot ensures that the Scoring Code JAR model always produces the same predictions as the original model. If verification fails, check the Log tab for error details. For more information, see the Scoring Code considerations.
Scoring Code JARs contain Java Scoring Code for a predictive model. The prediction calculation logic is identical to the DataRobot API—the code generation mechanism tests each model for accuracy as part of the generation process. The generated code is easily deployable in any environment and is not dependent on the DataRobot application.
Java requirement
The model JAR files require Java 8 or later.
The following sections describe how to work with Scoring Code:
Topic | Describes |
---|---|
Download Scoring Code from the Leaderboard | Downloading and configuring Scoring Code from the Leaderboard. |
Download Scoring Code from a deployment | Downloading and configuring Scoring Code from a deployment. |
Download time series Scoring Code | Downloading and configuring Scoring Code for a time series project. |
Scoring at the command line | Syntax for scoring with embedded CLI. |
Scoring Code usage examples | Examples showing how to use the Scoring Code JAR to score from the CLI and in a Java project. |
JAR structure | The contents of the Scoring Code JAR package. |
Generate Java models in an existing project | Retraining models that were created before the Scoring Code feature was enabled. |
Backward-compatible Java API | Using Scoring Code with models created on different versions of DataRobot. |
Scoring Code JAR integrations | Deploying DataRobot Scoring Code on an external platform. |
Android for Scoring Code | Using DataRobot Scoring Code on Android. |
Why use Scoring Code?¶
Scoring Code provides the following benefits:
-
Flexibility: Can be used anywhere that Java code can be executed.
-
Speed: Provides low-latency scoring without the API call overhead. Java code is typically faster than scoring through the Python API.
-
Integrations: Lets you integrate models into systems that can’t necessarily communicate with the DataRobot API. The Scoring Code can be used either as a primary means of scoring for fully offline systems or as a backend for systems that are using the DataRobot API.
-
Precision: Provides a complete match of predictions generated by DataRobot and the JAR model.
-
Hardware: Allows you to use additional hardware to score large amounts of data.
Feature considerations¶
Consider the following when working with Scoring Code:
-
Using Scoring Code in production requires additional development efforts to implement model management and model monitoring, which the DataRobot API provides out of the box.
-
Exportable Java Scoring Code requires extra RAM during model building. As a result, to use this feature, you should keep your training dataset under 8GB. Projects larger than 8GB may fail due to memory issues. If you get an out-of-memory error, decrease the sample size and try again. The memory requirement does not apply during model scoring. During scoring, the only limitation on the dataset is the RAM of the machine on which the Scoring Code is run.
Model support¶
Consider the following model support considerations when planning to use Scoring Code:
-
Scoring Code is available for models containing only supported built-in tasks. It is not available for custom models or models containing one or more custom tasks.
-
Scoring Code is not supported in multilabel projects.
-
Keras models do not support Scoring Code by default; however, support can be enabled by having an administrator activate the Enable Scoring Code Support for Keras Models feature flag. Note that these models are not compatible with Scoring Code for Android and Snowflake.
Additional instances in which Scoring Code generation is not available include:
- Naive Bayes models
- Visual AI and Location AI models
- Text tokenization involving the MeCab tokenizer for Japanese text (accessed via Advanced Tuning)
Text tokenization
Using the default text tokenization configuration, char-grams, Japanese text is supported.
Time series support¶
The following time series projects and models don't support Scoring Code:
- Time series binary classification projects
- Time series feature derivation projects resulting in datasets larger than 5GB
- Time series anomaly detection models
Anomaly detection models support
While time series anomaly detection models don't generally support Scoring Code, it is supported for IsolationForest and some XGBoost-based anomaly detection model blueprints. For a list of supported time series blueprints, see Time series blueprints with Scoring Code support.
Unsupported capabilities¶
The following capabilities are not supported for Scoring Code:
- Row-based / irregular data
- Nowcasting (single forecast point)
- Intramonth seasonality
- Time series blenders
- Autoexpansion
- Exponentially Weighted Moving Average (EWMA)
- Clustering
- Partial history / cold start
- Prediction Explanations
- Type conversions after uploading data
Supported capabilities¶
The following capabilities are supported for time series Scoring Code:
- Time series parameters for scoring at the command line
- Segmented modeling
- Prediction intervals
- Calendars (high resolution)
- Cross-series
- Zero inflated / naïve binary
- Nowcasting (historical range predictions)
- "Blind history" gaps
- Weighted features
Weighted features support
While weighted features are generally supported, they can result in Scoring Code becoming unavailable due to validation issues; for example, differences in rolling sum computation can cause consistency issues in projects with a weight feature and models trained on feature lists with weighted std
or weighted mean
.
Time series blueprints with Scoring Code support¶
The following blueprints typically support Scoring Code:
- AUTOARIMA with Fixed Error Terms
- ElasticNet Regressor (L2 / Gamma Deviance) using Linearly Decaying Weights with Forecast Distance Modeling
- ElasticNet Regressor (L2 / Gamma Deviance) with Forecast Distance Modeling
- ElasticNet Regressor (L2 / Poisson Deviance) using Linearly Decaying Weights with Forecast Distance Modeling
- ElasticNet Regressor (L2 / Poisson Deviance) with Forecast Distance Modeling
- Eureqa Generalized Additive Model (250 Generations)
- Eureqa Generalized Additive Model (250 Generations) (Gamma Loss)
- Eureqa Generalized Additive Model (250 Generations) (Poisson Loss)
- Eureqa Regressor (Quick Search: 250 Generations)
- eXtreme Gradient Boosted Trees Regressor
- eXtreme Gradient Boosted Trees Regressor (Gamma Loss)
- eXtreme Gradient Boosted Trees Regressor (Poisson Loss)
- eXtreme Gradient Boosted Trees Regressor with Early Stopping
- eXtreme Gradient Boosted Trees Regressor with Early Stopping (Fast Feature Binning)
- eXtreme Gradient Boosted Trees Regressor with Early Stopping (Gamma Loss)
- eXtreme Gradient Boosted Trees Regressor with Early Stopping (learning rate =0.06) (Fast Feature Binning)
- eXtreme Gradient Boosting on ElasticNet Predictions
- eXtreme Gradient Boosting on ElasticNet Predictions (Poisson Loss)
- Light Gradient Boosting on ElasticNet Predictions
- Light Gradient Boosting on ElasticNet Predictions (Gamma Loss)
- Light Gradient Boosting on ElasticNet Predictions (Poisson Loss)
- Performance Clustered Elastic Net Regressor with Forecast Distance Modeling
- Performance Clustered eXtreme Gradient Boosting on Elastic Net Predictions
- RandomForest Regressor
- Ridge Regressor using Linearly Decaying Weights with Forecast Distance Modeling
- Ridge Regressor with Forecast Distance Modeling
- Vector Autoregressive Model (VAR) with Fixed Error Terms
- IsolationForest Anomaly Detection with Calibration (time series)
- Anomaly Detection with Supervised Learning (XGB) and Calibration (time series)
While the blueprints listed above typically support Scoring Code, there are situations when Scoring Code is unavailable:
- Scoring Code might not be available for some models generated using Feature Discovery.
- Consistency issues can occur for non day-level calendars when the event is not in the dataset; therefore, Scoring Code is unavailable.
- Consistency issues can occur when inferring the forecast point in situations with a non-zero blind history; however, Scoring Code is still available in this scenario.
- Scoring Code might not be available for some models that use text tokenization involving the MeCab tokenizer for Japanese text (accessed via Advanced Tuning). Using the default configuration of char-grams during AutoPilot, Japanese text is supported.
- Differences in rolling sum computation can cause consistency issues in projects with a weight feature and models trained on feature lists with
weighted std
orweighted mean
.
Prediction Explanations support¶
Consider the following when working with Prediction Explanations for Scoring Code:
-
To download Prediction Explanations with Scoring Code, you must select Include Prediction Explanations during Leaderboard download or Deployment download. This option is not available for Legacy download.
-
Scoring Code only supports XEMP-based Prediction Explanations. SHAP-based Prediction Explanations aren't supported.
-
Scoring Code doesn't support Prediction Explanations for time series models.