Data assets within DataRobot can be one of the following:
- Snapshot: DataRobot has imported and stored a copy of the dataset in the AI Catalog.
- Dynamic: DataRobot has a "live" connection to the dataset and only pulls from the database when a copy of the dataset is needed.
When you register a data asset in DataRobot, a badge is added to the entry to indicate the state and type of dataset. See the table below for a description of each badge:
|State||Badge||Description||Supported ingest methods|
||A dataset built from a Spark query.||Spark SQL|
||A dataset that has a snapshot.||URL, Database|
||A static file with a snapshot.
Datasets uploaded using data stages also display the
||A dataset that has no snapshot.||URL, Database|
A snapshot captures the data at a specific point in time. When you import or create a snapshot, DataRobot pulls from the data asset and registers a copy of it in the catalog.
- You can profile and model on the snapshot dataset.
- You can access a version history of the dataset.
- Data freshness—if the dataset is updated often, it can quickly become stale because the snapshotted data is disconnected from the underlying source data.
- Data governance—when you create a snapshot, you are storing a copy of the data in DataRobot and can share access to that copy with users who do not have access to the original source data, bypassing an organization's strict controls over access to it. Additionally, security mechanisms used by DataRobot to protect the data may not be the same as what an organization uses for the data in the original source (e.g., encryption).
To keep snapshot datasets up-to-date, they can be automatically refreshed periodically, and are also automatically versioned to preserve dataset lineage and enhance the overall governance capabilities of DataRobot.
Dynamic means there is a "live" connection to the source data, so when DataRobot adds the database table/view, it does not create a materialized data entry. When a copy of the data is needed—for example, to create a project or make predictions—DataRobot uses the most recent version of the data. Note that the dataset retains the
DYNAMIC badge in the catalog.
- When you create a project or make predictions, DataRobot is using the most up-to-date data.
- Allows you to perform tasks like automatic retraining.
- Profiling and versioning are not supported.
Create snapshots from dynamic datasets
If needed, you can manually create a snapshot from a dynamic dataset.
What happens if I create a snapshot from a dynamic dataset?
In the AI Catalog, the dataset will be marked as
SNAPSHOT; as with all
SNAPSHOT datasets, you can still create new snapshots from it. Note that for such a dataset, only the snapshots are used to create projects.