# Add external test data

> Add external test data - To better evaluate model performance, upload any number of additional test
> datasets after project data has been partitioned and models have been trained.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-12T13:31:55.479423+00:00` (UTC).

## Primary page

- [Add external test data](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/experiment-ext-data.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Attach external test data](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/experiment-ext-data.html.md#attach-external-test-data): In-page section heading.
- [Score models](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/experiment-ext-data.html.md#score-models): In-page section heading.
- [Compare models](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/experiment-ext-data.html.md#compare-models): In-page section heading.
- [Compare insights with external test sets](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/experiment-ext-data.html.md#compare-insights-with-external-test-sets): In-page section heading.

## Related documentation

- [NextGen UI documentation](https://docs.datarobot.com/en/docs/workbench/index.html.md): Linked from this page.
- [Workbench](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/index.html.md): Linked from this page.
- [Predictive experiments](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/index.html.md): Linked from this page.
- [Manage experiments](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/index.html.md): Linked from this page.
- [actuals](https://docs.datarobot.com/en/docs/reference/glossary/index.html.md#actuals): Linked from this page.
- [model sorting](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/manage-experiments/leaderboard.html.md#model-sorting): Linked from this page.
- [Accuracy Over Time](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/aot.html.md): Linked from this page.
- [Confusion Matrix](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/evaluate/multiclass.html.md): Linked from this page.
- [Individual Prediction Explanations](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/shap-predex.html.md): Linked from this page.
- [Metric Scores](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/metric-scores.html.md): Linked from this page.
- [Lift Chart](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/lift-chart.html.md): Linked from this page.
- [Residuals](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/evaluate/residuals-classic.html.md): Linked from this page.
- [ROC Curve](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/roc-curve.html.md): Linked from this page.
- [Stability](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/experiments/experiment-insights/stability.html.md): Linked from this page.

## Documentation content

Add external test data to help assess model performance prior to deployment by comparing model accuracy. External test datasets let you evaluate trained models against datasets that were not used during training (i.e., external holdout) using the original model's dataset partitions. Then, compare metric scores and visualizations to ensure consistent performance.

An external test dataset is one that:

- Contains actuals (values for the target).
- Is not part of the original dataset (models have not trained on any part of it).

To leverage this added level of comparison, attach a dataset to an experiment and score individual models on demand. Results appear alongside your training-derived scores throughout the UI. You can upload any number of additional test datasets after project data has been partitioned and models have been trained.

> [!NOTE] Note
> Support for external test sets is available for all experiment types except supervised time series. Unsupervised time series supports external test sets for anomaly detection but not clustering.

The following sections describe the workflow for adding test data, scoring models, and evaluating results.

## Attach external test data

From the Leaderboard of the experiment you want to evaluate, add a dataset. For supervised learning, the external set must contain the target column and all columns present in the training dataset (although additional columns can be added).

1. Open theActions menuand selectAdd external test data.
2. In the window that opens, select a dataset and clickAdd.

> [!NOTE] Note
> The workflow is slightly different for anomaly detection projects. For those, the prediction dataset must contain the same columns as those in the training set with at least one column for known anomalies. Select the known anomaly column as the actuals value.

While the dataset is registering, the Add external test data option is disabled. Once complete, the Leaderboard updates to show an External test data column, and a dataset selector, Test dataset, appears above it (disabled when only one dataset is attached).

> [!NOTE] Note
> In a binary classification project, when you click Run external test, the current value of the Prediction Threshold is used for computation of the predicted labels. In the downloaded predictions, the labels correspond to that threshold, even if you updated the threshold between computing and downloading. DataRobot displays the threshold that was used in the calculation in the dataset listing.

You can attach additional datasets at any time. Once two or more datasets are attached, the dataset selector becomes interactive. You can switch between datasets in both the full Leaderboard and the sidebar—these views always reflect the same selection.

## Score models

Attaching a dataset does not automatically score the existing models—you must start evaluation for each model individually. To compute a score:

1. If more than one external dataset has been added, choose the desired dataset from theTest datasetdropdown. The selector is disabled if only one dataset is attached.
2. Click theScorebutton on each model for which you wish to compute scores using the external data. Once complete, the score is added to the model's summary.

## Compare models

To easily compare Leaderboard scores:

1. Usemodel sortingin the sidebar listing to show only those models scored on external data. Changing the displayed metric updates scores for the external data partition accordingly.
2. ClickExternal test dataheader to by scores, ascending or descending.

Each model/dataset pair can only be evaluated once. There is no option to re-run or retry an evaluation.

## Compare insights with external test sets

External test data selection is available across multiple insight types. Each uses the same data source selector to switch between training partitions and external datasets. To view the results of the external test data as part model insights, use the Data selection dropdown to select an external test set as if it was a partition in the original project data.

This option is available when using the following insights:

- Accuracy Over Time (OTV only)
- Confusion Matrix
- Individual Prediction Explanations
- Metric Scores
- Lift Chart
- Residuals
- ROC Curve
- Stability (OTV only)

## Feature considerations

- Insights are not computed if an external dataset has fewer than 10 rows; however, metric scores are computed and displayed on the Leaderboard.
- TheROC Curveinsight is disabled if the external dataset only contains single class actuals.
