Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Make predictions

Use a deployment's Predictions > Make predictions tab to efficiently score datasets with a deployed model by making batch predictions.

Note

To make predictions with a model before deployment, select the model from the Models list in an experiment and then click Model actions > Make predictions.

Batch predictions are a method of making predictions with large datasets, in which you pass input data and get predictions for each row. DataRobot writes these predictions to output files. You can also:

Select a prediction dataset

To make batch predictions with a deployed model, navigate to the deployment's Predictions > Make predictions tab and upload a prediction source:

  • Drag a file into the Prediction dataset box.

  • Click Choose file and select one of the following:

    Upload method Description
    Upload local file Select a file from your local file system to upload that dataset for predictions.
    Data registry Select a file previously uploaded to the data registry.
    Wrangler (preview) If you have enabled the wrangler batch predictions preview feature, select a file wrangled in Workbench.

    In your local filesystem, select a dataset file, and then click Open.

    When you upload a prediction dataset, it is automatically stored in the AI Catalog once the upload is complete. Be sure not to navigate away from the page during the upload, or the dataset will not be stored in the catalog. If the dataset is still processing after the upload, that means DataRobot is running EDA on the dataset before it becomes available for use.

    In the Select a dataset panel, click a dataset, and then click Confirm.

    Wrangler data connection

    Wrangler recipes for batch prediction jobs only support data wrangled from a Snowflake data connection.

    In the Select a recipe panel, click a dataset wrangled from a Snowflake data connection, and then click Confirm.

    Time series data requirements

    Making predictions with time series models requires the dataset to be in a particular format. The format is based on your time series project settings. Ensure that the prediction dataset includes the correct historical rows, forecast rows, and any features known in advance. In addition, to ensure DataRobot can process your time series data, configure the dataset to meet the following requirements:

    • Sort prediction rows by their timestamps, with the earliest row first.
    • For multiseries, sort prediction rows by series ID and then by timestamp, with the earliest row first.

    There is no limit on the number of series DataRobot supports. The only limit is the job timeout, as mentioned in Limits. For dataset examples, see the requirements for the scoring dataset

Make predictions with a deployment

This section explains how to use the Make Predictions tab to make batch predictions for standard deployments and time series deployments.

Field name Description
1 Prediction dataset Select a prediction dataset by uploading a local file or importing a dataset from the Data Registry.
2 Time series options Specify and configure a time series prediction method.
3 Prediction options Configure the prediction options.
4 Compute and download predictions Score the data and download the predictions.
5 Download recent predictions View your recent batch predictions and download the results. These predictions are available for download for 48 hours.

Set time series options

Time series data requirements

Making predictions with time series models requires the dataset to be in a particular format. The format is based on your time series project settings. Ensure the the prediction dataset includes the correct historical rows, forecast rows, and any features known in advance. In addition, to ensure DataRobot can process your time series data, configure the dataset to meet the following requirements:

  • Sort prediction rows by their timestamps, with the earliest row first.
  • For multiseries, sort prediction rows by series ID and then by timestamp, with the earliest row first.

There is no limit on the number of series DataRobot supports. The only limit is the job timeout, as mentioned in Limits. For dataset examples, see the requirements for the scoring dataset.

To configure the Time series options, under Time series prediction method, define the Forecast point settings:

  • Set automatically: DataRobot sets the forecast point for you based on the scoring data, generally the latest possible date timestamp that is a valid forecast point.

  • Set manually: Set a specific date range using the date selector, configuring the Start and End dates manually.

In addition, you can click Show advanced options and enable Ignore missing values in known-in-advance columns to make predictions even if the provided source dataset is missing values in the known-in-advance columns; however, this may negatively impact the computed predictions.

Set prediction options

Once the file is uploaded, configure the Prediction options. Optionally, you can click Show advanced options to configure additional options.

Element Description
1 Include additional feature values in prediction results Writes input features to the prediction results file alongside predictions. To add specific features, enable the Include additional feature values in prediction results toggle, select Add specified features, and type feature names to filter for and then select features. To include every feature from the dataset, select Add all features. You can only append a feature (column) present in the original dataset, although the feature does not have to have been part of the feature list used to build the model. Derived features are not included.
2 Include Prediction Explanations Adds columns for Prediction Explanations to your prediction output.
  • Number of explanations: Enter the maximum number of explanations you want to request from the deployed model. You can request 100 explanations per prediction request.
  • Low prediction threshold: Enable and define this threshold to provide Prediction Explanations for any values below the set threshold value.
  • High prediction threshold: Enable and define this threshold to provide Prediction Explanations for any values above the set threshold value.
  • Number of ngram explanations: Enable and define the maximum number of text ngram explanations to return per row in the dataset. The default (and recommended) setting is all (no limit).
For multiclass models, use the Classes settings to control the method for selecting which classes are used in explanation computation:
  • Predicted: Select classes based on prediction value. For each row in the prediction dataset, compute explanations for the number of classes set by the Number of classes value.
  • List of classes: Select one or more specific classes from a list of classes. For each row, explain only the classes selected in the List of Classes menu.
If you can't enable Prediction Explanations, see Why can't I enable Prediction Explanations?.
3 Include prediction outlier warning Includes warnings for outlier prediction values (only available for regression model deployments).
4 Store predictions for data exploration Tracks data drift, accuracy, fairness, and data exploration (if enabled for the deployment).
5 Chunk size Adjusts the chunk size selection strategy. By default, DataRobot automatically calculates the chunk size; only modify this setting if advised by your DataRobot representative. For more information, see What is chunk size?
6 Concurrent prediction requests Limits the number of concurrent prediction requests. By default, prediction jobs utilize all available prediction server cores. To reserve bandwidth for real-time predictions, set a cap for the maximum number of concurrent prediction requests.
7 Include prediction status Adds a column containing the status of the prediction.
8 Use default prediction instance Lets you change the prediction instance. Turn the toggle off to select a prediction instance.
9 Column names remapping Changes column names in the prediction job's output by mapping them to entries added in this field. Click + Add column name remapping and define the Input column name to replace with the specified Output column name in the prediction output. If you incorrectly add a column name mapping, you can click the delete icon to remove it.
Why can't I enable Prediction Explanations?

If you can't Include Prediction Explanations, it is likely because:

  • The model's validation partition doesn't contain the required number of rows.

  • For a Combined Model, at least one segment champion validation partition doesn't contain the required number of rows. To enable Prediction Explanations, manually replace retrained champions before creating a model package or deployment.

What is chunk size?

The batch prediction process chunks your data into smaller pieces and scores those pieces one by one, allowing DataRobot to score large batches. The Chunk size setting determines the strategy DataRobot uses to chunk your data. DataRobot recommends the default setting of Auto chunking, as it performs the best overall; however, other options are available:

  • Fixed: DataRobot identifies an initial, effective chunk size and continues to use it for the rest of the model scoring process.

  • Dynamic: DataRobot increases the chunk size while the model's scoring speed is acceptable and decreases the chunk size if the scoring speed falls.

  • Custom: A data scientist sets the chunk size, and DataRobot continues to use it for the rest of the model scoring process.

Compute and download predictions

After you configure predictions settings and click Compute and download predictions to score the data, wait for the prediction job to complete. You can perform the following actions on completed prediction jobs:

Icon Action
For time series predictions, view the Forecast visualization.
Download the predictions file.
Access logs to view and optionally copy the prediction job run details.

Predictions are available for download on the Predictions > Make predictions tab for the next 48 hours.

Cancel a batch prediction job

Click the stop icon while the job is running to cancel it. For canceled or failed jobs, you can click the logs icon to view the logs for the job.


Updated August 6, 2024