Skip to content

Add external test data

Add external test data to help assess model performance prior to deployment by comparing model accuracy. External test datasets let you evaluate trained models against datasets that were not used during training (i.e., external holdout) using the original model's dataset partitions. Then, compare metric scores and visualizations to ensure consistent performance.

An external test dataset is one that:

  • Contains actuals (values for the target).
  • Is not part of the original dataset (models have not trained on any part of it).

To leverage this added level of comparison, attach a dataset to an experiment and score individual models on demand. Results appear alongside your training-derived scores throughout the UI. You can upload any number of additional test datasets after project data has been partitioned and models have been trained.

Note

Support for external test sets is available for all experiment types except supervised time series. Unsupervised time series supports external test sets for anomaly detection but not clustering.

The following sections describe the workflow for adding test data, scoring models, and evaluating results.

Attach external test data

From the Leaderboard of the experiment you want to evaluate, add a dataset. For supervised learning, the external set must contain the target column and all columns present in the training dataset (although additional columns can be added).

  1. Open the Actions menu and select Add external test data.

  2. In the window that opens, select a dataset and click Add.

Note

The workflow is slightly different for anomaly detection projects. For those, the prediction dataset must contain the same columns as those in the training set with at least one column for known anomalies. Select the known anomaly column as the actuals value.

While the dataset is registering, the Add external test data option is disabled. Once complete, the Leaderboard updates to show an External test data column, and a dataset selector, Test dataset, appears above it (disabled when only one dataset is attached).

Note

In a binary classification project, when you click Run external test, the current value of the Prediction Threshold is used for computation of the predicted labels. In the downloaded predictions, the labels correspond to that threshold, even if you updated the threshold between computing and downloading. DataRobot displays the threshold that was used in the calculation in the dataset listing.

You can attach additional datasets at any time. Once two or more datasets are attached, the dataset selector becomes interactive. You can switch between datasets in both the full Leaderboard and the sidebar—these views always reflect the same selection.

Score models

Attaching a dataset does not automatically score the existing models—you must start evaluation for each model individually. To compute a score:

  1. If more than one external dataset has been added, choose the desired dataset from the Test dataset dropdown. The selector is disabled if only one dataset is attached.

  2. Click the Score button on each model for which you wish to compute scores using the external data. Once complete, the score is added to the model's summary.

Compare models

To easily compare Leaderboard scores:

  1. Use model sorting in the sidebar listing to show only those models scored on external data. Changing the displayed metric updates scores for the external data partition accordingly.

  2. Click External test data header to by scores, ascending or descending.

Each model/dataset pair can only be evaluated once. There is no option to re-run or retry an evaluation.

Compare insights with external test sets

External test data selection is available across multiple insight types. Each uses the same data source selector to switch between training partitions and external datasets. To view the results of the external test data as part model insights, use the Data selection dropdown to select an external test set as if it was a partition in the original project data.

This option is available when using the following insights:

Feature considerations

  • Insights are not computed if an external dataset has fewer than 10 rows; however, metric scores are computed and displayed on the Leaderboard.

  • The ROC Curve insight is disabled if the external dataset only contains single class actuals.