Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Make predictions

After you create an experiment and train models, you can make predictions on new data, registered data, or training data to validate those models.

To make predictions with a model in a Workbench experiment:

  1. Select the model from the Models list and then click Model actions > Make predictions.

  2. On the Make Predictions page, upload a Prediction source, drag a file into the Prediction dataset box, or click Choose file and select one of the following:

    Upload method Description
    Upload local file Select a file from your local filesystem to upload that dataset for predictions.
    Use model training data Select a portion of the training data to use as a prediction dataset.
    Data registry Select a file previously uploaded to the data registry.
    Wrangler (preview) If you have enabled the wrangler batch predictions preview feature, select a file wrangled in Workbench.

    In your local filesystem, select a dataset file, and then click Open.

    When you upload a prediction dataset, it is automatically stored in the AI Catalog once the upload is complete. Be sure not to navigate away from the page during the upload, or the dataset will not be stored in the catalog. If the dataset is still processing after the upload, that means DataRobot is running EDA on the dataset before it becomes available for use.

    Select one of the following options, depending on the project type:

    Project type Options
    AutoML Select one of the following training data options:
    • Validation
    • Holdout
    • All data
    OTV/Time Series Select one of the following training data options:
    • All backtests
    • Holdout

    In-sample prediction risk

    Depending on the option you select and the sample size the model was trained on, predicting on training data can generate in-sample predictions, meaning that the model has seen the target value during training and its predictions do not necessarily generalize well. If DataRobot determines that one or more training rows are used for predictions, the Overfitting risk warning appears. These predictions should not be used to evaluate the model's accuracy.

    In the Select a dataset panel, click a dataset, and then click Confirm.

    Wrangler data connection

    Wrangler recipes for batch prediction jobs only support data wrangled from a Snowflake data connection.

    In the Select a recipe panel, click a dataset wrangled from a Snowflake data connection, and then click Confirm.

    Time series data requirements

    Making predictions with time series models requires the dataset to be in a particular format. The format is based on your time series project settings. Ensure that the prediction dataset includes the correct historical rows, forecast rows, and any features known in advance. In addition, to ensure DataRobot can process your time series data, configure the dataset to meet the following requirements:

    • Sort prediction rows by their timestamps, with the earliest row first.
    • For multiseries, sort prediction rows by series ID and then by timestamp, with the earliest row first.

    There is no limit on the number of series DataRobot supports. The only limit is the job timeout, as mentioned in Limits. For dataset examples, see the requirements for the scoring dataset

    If you select the wrong dataset, you can remove your selection from the Prediction source setting by clicking the delete icon ().

  3. Next, you can set the prediction options (for time series models, you can also set the time series options) and then compute predictions.

Set time series options

Time series options availability

If you selected Use model training data as the prediction source, you can't configure the time series options.

After you configure the Prediction source with a properly formatted time series prediction dataset, you can configure the time series-specific settings in the Time series options section. Under Forecast point, select a Selection method to define the date from which you want to begin making predictions:

  • Set automatically: DataRobot selects the latest date that includes a target value and then adds the FDW offset.

  • Set manually: Select a forecast point within the date range DataRobot detects from the provided prediction source (for example, "Select a date between 2012-07-05 and 2014-06-20").

In addition, you can click Show advanced options and enable Ignore missing values in known-in-advance columns to make predictions even if the provided source dataset is missing values in the known-in-advance columns; however, this may negatively impact the computed predictions.

Set prediction options

After you configure the Prediction source, you can can configure optional settings in the Prediction options section:

Setting Description
Include additional feature values in prediction results Include input features (columns) in the prediction results file alongside the predictions, based on the selection option:
  • Add specified features: Filter for and include the selected features from the dataset
  • Add all features: Include every feature from the dataset.
You can only append a feature (column) present in the original dataset, although the feature does not have to have been part of the feature list used to build the model. Derived features are not included.
Include Prediction Explanations Adds columns for Prediction Explanations to your prediction output.
  • Number of explanations: Enter the maximum number of explanations you want to request from the deployed model. You can request 100 explanations per prediction request.
  • Low prediction threshold: Enable and define this threshold to provide prediction explanations for any values below the set threshold value.
  • High prediction threshold: Enable and define this threshold to provide prediction explanations for any values above the set threshold value.
  • Number of ngram explanations: Enable and define the maximum number of text ngram explanations to return per row of the dataset. The default (and recommended) setting is all (no limit).
If you can't enable Prediction Explanations, see Why can't I enable Prediction Explanations?.
Classes For multiclass models with Prediction Explanations enabled, control the method for selecting which classes are used in explanation computation.
The Classes options include:
  • Predicted: Select classes based on prediction value. For each row in the prediction dataset, compute explanations for the number of classes set by the Number of classes value.
  • Actual: For predictions on the training dataset, compute explanations from classes that are known values. For each row, explain the class that is the "ground truth."
  • List of classes: Select one or more specific classes from a list of classes. For each row, explain only the classes selected in the List of Classes menu.
Include prediction intervals For time series models, include only predictions falling within the specified interval, based on the residual errors measured during the model's backtesting.
Why can't I enable Prediction Explanations?

If you can't Include Prediction Explanations, it is likely because:

  • The model's validation partition doesn't contain the required number of rows.

  • For a Combined Model, at least one segment champion validation partition doesn't contain the required number of rows. To enable Prediction Explanations, manually replace retrained champions before creating a model package or deployment.

Compute and download predictions

After you configure the Prediction options, click Compute and download predictions to start scoring the data, then view the scoring results under Download recent predictions:

From the Download recent predictions list, you can do the following:

  • While the prediction job is running, you can click the close icon () to stop the job.

  • If the prediction job is successful, click the download icon () to download a predictions file or the logs icon () to view and optionally copy the run details.

    Note

    Predictions are available for download for 48 hours from the time of prediction computation.

  • If the prediction job failed, click the logs icon () to view and optionally copy the run details.


Updated August 6, 2024