Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Make predictions

After you create an experiment and train models, you can make predictions on new data, registered data, or training data to validate those models.

To make predictions with a model in a Workbench experiment:

  1. Select the model from the Models list and then click Model actions > Make predictions.

  2. On the Make Predictions page, upload a Prediction source by:

    • Dragging a file into the Prediction source group box.

    • Clicking Choose file and selecting one of the following:

    Select a file from your local filesystem and then click Open.

    When you upload a prediction dataset, it is automatically stored in the AI Catalog once the upload is complete. Be sure not to navigate away from the page during the upload, or the dataset will not be stored in the catalog. If the dataset is still processing after the upload, that means DataRobot is running EDA on the dataset before it becomes available for use.

    Select one of the following options, depending on the project type:

    Project type Options
    AutoML Select one of the following training data options:
    • Validation
    • Holdout
    • All data
    OTV/Time Series Select one of the following training data options:
    • All backtests
    • Holdout

    In-sample prediction risk

    Depending on the option you select and the sample size the model was trained on, predicting on training data can generate in-sample predictions, meaning that the model has seen the target value during training and its predictions do not necessarily generalize well. If DataRobot determines that one or more training rows are used for predictions, the Overfitting risk warning appears. These predictions should not be used to evaluate the model's accuracy.

    Select a data registry file from the AI Catalog and then click Select a dataset.

    Time series data requirements

    Making predictions with time series models requires the dataset to be in a particular format. The format is based on your time series project settings. Ensure that the prediction dataset includes the correct historical rows, forecast rows, and any features known in advance. In addition, to ensure DataRobot can process your time series data, configure the dataset to meet the following requirements:

    • Sort prediction rows by their timestamps, with the earliest row first.
    • For multiseries, sort prediction rows by series ID and then by timestamp, with the earliest row first.

    There is no limit on the number of series DataRobot supports. The only limit is the job timeout, as mentioned in Limits. For dataset examples, see the requirements for the scoring dataset

  3. Next, you can set the prediction options (for time series models, you can also set the time series options) and then compute predictions.

Note

If you select the wrong dataset, you can remove your selection from the Prediction source setting by clicking the delete icon ().

Set time series options

Time series options availability

If you selected Use model training data as the prediction source, you can't configure the time series options.

After you configure the Prediction source with a properly formatted time series prediction dataset, you can can configure the time series-specific settings in the Time series options section. Under Forecast point, select a Selection method to define the date from which you want to begin making predictions:

  • Set automatically: DataRobot selects the latest date that includes a target value and then adds the FDW offset.

  • Set manually: Select a forecast point within the date range DataRobot detects from the provided prediction source (for example, "Select a date between 2012-07-05 and 2014-06-20").

In addition, you can click Show advanced options and enable Ignore missing values in known-in-advance columns to make predictions even if the provided source dataset is missing values in the known-in-advance columns; however, this may negatively impact the computed predictions.

Set prediction options

After you configure the Prediction source, you can can configure optional settings in the Prediction options section:

Setting Description
Include additional feature values in prediction results Include input features (columns) in the prediction results file alongside the predictions, based on the selection option:
  • Add specified features: Filter for and include the selected features from the dataset
  • Add all features: Include every feature from the dataset.
Include prediction intervals For time series models, include only predictions falling within the specified interval, based on the residual errors measured during the model's backtesting.

Note

You can only append a feature (column) present in the original dataset, although the feature does not have to have been part of the feature list used to build the model. Derived features are not included.

Compute and download predictions

After you configure the Prediction options, click Compute and download predictions to start scoring the data, then view the scoring results under Download recent predictions:

From the Download recent predictions list, you can do the following:

  • While the prediction job is running, you can click the close icon () to stop the job.

  • If the prediction job is successful, click the download icon () to download a predictions file or the logs icon () to view and optionally copy the run details.

    Note

    Predictions are available for download for 48 hours from the time of prediction computation.

  • If the prediction job failed, click the logs icon () to view and optionally copy the run details.


Updated February 26, 2024