Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

UI prediction options

DataRobot provides a variety of methods for making predictions via the UI (at least, launched from the UI). Each is briefly described in the table below:

Use this method... When you... Notes
Make Predictions Want in-app predictions You can either:
  • Make predictions on a new dataset (up to 1GB, by default) without coding
  • Make out-of-sample predictions on your original dataset, including holdout and/or validation partitions of large files.
Deploy Have a dedicated server Provides a code sample for use with:
  • Real-time scoring with the Prediction API
  • Near real-time (score multiple rows at a time continuously) or batch cases (scheduled scoring of multiple rows of data) with Python batch scoring
DataRobot Prime Want out-of-app predictions Produces scoring code that is an approximation of a selected model, creating a simplified version that describes "business rules" for predictions.
Downloads Want out-of-app, exact reproducibility of predictions Either:
  • Export exact, validated Java scoring code for a model that is easily deployable for low-latency, offline predictions.
  • Create an isolated and stable environment for your prediction system with a standalone Prediction API.
Transfer models Want to transfer a model to a Standalone Prediction Server for increased robustness Export with the Downloads tab, import using Manage Predictions.
Deploy to Hadoop Run on Hadoop To score data that resides in an HDFS that is connected to DataRobot.

Alternatively, you can use the DataRobot API prediction functions if you want to use the same interface for modeling and predictions. Note that some of the tools used for deeper model investigation are only available through the DataRobot GUI.

Warning

When performing predictions, the positive class has multiple representations that DataRobot can choose from, from the original positive class as written on the dataset, a user specified choice in the frontend, or the positive class as provided by the prediction set. Currently DataRobot's internal rules regarding this are not obvious, which can lead to automation issues like str("1.0") being returned as the positive class instead of int(1). This issue is being fixed by standardizing the internal ruleset in a future release.

Avoiding common mistakes

The section on dataset guidelines provides important information about DataRobot's dataset requirements. In addition, consider:

  1. Under-trained models. The most common prediction mistake is to use models in production without retraining them beyond the initial training set. Best practice suggests the following workflow:

    • Select the best model based on the validation set.
    • Retrain the best model, including the validation set.
    • Unlock holdout, and use the holdout to validate that the retrained model performs as well as you expect.
    • Note that this does not apply if you are using the model DataRobot selects as “Recommended for Deployment." DataRobot automates all three of these steps for the recommended model and trains it to 100% of the data.
  2. File encoding issues. Be certain that you properly format your data to avoid prediction errors. For example, unquoted newline characters and commas in CSV files often cause problems. JSON can be a better choice for data that contains large amounts of text because JSON is more standardized than CSV. CSV can be faster than JSON, but only when it is properly formatted.

  3. Insufficient cores. When making predictions, keep the number of threads or processes less than or equal to the number of prediction worker cores you have and make synchronous requests. That is, the number of concurrent predictions should generally not exceed the number of prediction worker cores on your dedicated prediction server(s). If you are not sure how many prediction cores you have, contact DataRobot Support.

Notes on prediction speed

  1. Model scoring speed. Scoring time differs by model and not all models are fast enough for "real-time" scoring. Before going to production with a model, verify that the model you select is fast enough for your needs. Use the Speed vs. Accuracy tab to display model scoring time.

  2. Understanding the model cache. A dedicated prediction server scores quickly because of its in-memory model cache. As a result, the first few requests using a new model may be slower because the model must first be retrieved.

  3. Computing predictions with Prediction Explanations. Computing predictions with Prediction Explanations requires a significantly higher number of operations than computing predictions only. Expect higher runtimes, although actual speed is model-dependent. Reducing the number of features used or avoiding blenders and text variables may increase speed.


Updated October 26, 2021
Back to top