Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Asset states

Data assets within DataRobot can be one of the following:

  • Snapshot: DataRobot has imported and stored a copy of the dataset in the AI Catalog.
  • Dynamic: DataRobot has a "live" connection to the dataset and only pulls from the database when a copy of the dataset is needed.

When you register a data asset in DataRobot, a badge is added to the entry to indicate the state and type of dataset. See the table below for a description of each badge:

State Badge Description Supported ingest methods
Snapshot, Dynamic SPARK A dataset built from a Spark query. Spark SQL
Snapshot SNAPSHOT A dataset that has a snapshot. URL, Database
Snapshot STATIC A static file with a snapshot.

Datasets uploaded using data stages also display the STATIC badge, however, the FROM field displays stage://{stageId}/{filename}.
Local file
Dynamic DYNAMIC A dataset that has no snapshot. URL, Database

Snapshot

A snapshot captures the data at a specific point in time. When you import or create a snapshot, DataRobot pulls from the data asset and registers a copy of it in the catalog.

Pros:

  • You can profile and model on the snapshot dataset.
  • You can access a version history of the dataset.

Cons:

  • Data freshness—if the dataset is updated often, it can quickly become stale because the snapshotted data is disconnected from the underlying source data.
  • Data governance—when you create a snapshot, you are storing a copy of the data in DataRobot and can share access to that copy with users who do not have access to the original source data, bypassing an organization's strict controls over access to it. Additionally, security mechanisms used by DataRobot to protect the data may not be the same as what an organization uses for the data in the original source (e.g., encryption).

Schedule snapshots

To keep snapshot datasets up-to-date, they can be automatically refreshed periodically, and are also automatically versioned to preserve dataset lineage and enhance the overall governance capabilities of DataRobot.

Dynamic

Dynamic means there is a "live" connection to the source data, so when DataRobot adds the database table/view, it does not create a materialized data entry. When a copy of the data is needed—for example, to create a project or make predictions—DataRobot uses the most recent version of the data. Note that the dataset retains the DYNAMIC badge in the catalog.

Pros:

  • When you create a project or make predictions, DataRobot is using the most up-to-date data.
  • Allows you to perform tasks like automatic retraining.

Cons:

  • Profiling and versioning are not supported.

Create snapshots from dynamic datasets

If needed, you can manually create a snapshot from a dynamic dataset.

What happens if I create a snapshot from a dynamic dataset?

In the AI Catalog, the dataset will be marked as SNAPSHOT; as with all SNAPSHOT datasets, you can still create new snapshots from it. Note that for such a dataset, only the snapshots are used to create projects.


Updated May 17, 2024