Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Update project datasets

When you add a dataset into your project, either as the base dataset, through a look up, or an append, you are identifying a specific version of the dataset to use in your project. If newer versions of the datasets become available in the library, your project doesn’t automatically use the newer versions; the work you've done in your project and the subsequent results may depend on the specific dataset versions you initially selected.

Many times this works well. Other times, you may want to update the project datasets with newer versions.

There are two methods for updating a project’s datasets:

  • Refresh a project dataset to the newest version of an existing dataset.
  • Replace a project dataset with another dataset.

A dataset refresh updates the project data to use the most current version of a dataset.

For example, if you start your project with a dataset that is version 1 and over time newer versions of the dataset are imported into the library (either through manual import or automation,) you have the option to refresh the dataset in your project to use the newest version.

Refresh the datasets

To refresh a dataset to the latest version:

  1. In the project, click steps in the Tools bar.

  2. At the bottom of the Steps tool, click Refresh Datasets.

    The Refresh datasets pane appears. All datasets that can be refreshed are selected by default.

  3. Select the datasets to refresh; you can select All or select individual datasets.

  4. Click Save.

    The project data is updated to the most current versions of the selected datasets.

When can a dataset be refreshed?

A dataset can be refreshed when:

When a dataset can be refreshed in a project, you are provided visual cues:

  • If the Refresh Datasets button is green, a newer version of one or more datasets used by your project is detected. If the button is gray, there are no newer versions of your project’s datasets.

  • If the Use Latest button located on the Refresh datasets pane is green, a newer version of the dataset is available.

  • The file details link opens a Version Information pane that allows you to quickly determine the number of new rows and columns in the dataset's latest version. If your project is in Interactive Mode and the dataset contains more rows than the interactive portion, you will also see a column for Interactive that lists the number of rows you can bring into the project. This number is important because it allows you to quickly determine if the Interactive portion has been increased or decreased, and then you can determine if you do want to refresh the dataset.

Note

All Data Prep projects have a maximum project row limit that is set by the Data Prep System Administrator. If you are close to reaching that limit, and your Administrator cannot increase it, you can selectively choose which datasets to update with latest versions so that you can continue bringing newer data into your project without exceeding the project row limit.

If you deselect a dataset, the Use Latest button turns dark gray. This indicates that there is a newer version of the dataset and that you have chosen not to update the dataset.

When there are no new versions for the dataset, the Use Latest button is light gray.

Replace a dataset

Unlike refreshing the data, replacing a dataset lets you decide which dataset or which specific version of a dataset to use in your project. For example, if you started a project with version 1 of a dataset and five additional versions were imported, replacing a dataset lets you pick the exact version to use, which may not be the latest version. Replacing the dataset also gives you the power to entirely change which dataset is being used in your project.

To replace a dataset used in a project:

  1. In the project, click steps in the Tools bar.

  2. In the Steps tool, click the step with the dataset you want to update and click Edit at the top.

    The project returns to the state it was in when the selected step was created.

  3. Above the Data preview pane, click the name of the dataset you want to update.

  4. On the Select Datasets page, select the dataset you want to use.

    • To select a previous version of a dataset, on the dataset, click All Versions. On the version you want to use, click Select.
    • To select a different dataset in your library, on the dataset, click Select.
  5. Click Save.

    The project data is updated.


Updated October 28, 2021
Back to top