Explore registry data¶
Once data registration is complete, you can select the dataset to view various insights and interact with the dataset.
There are two types of data stored in the Data Registry:
- Materialized data: These datasets are marked with the
Static
,Snapshot
, orSpark
badge. As part of the registration process upon import, DataRobot runs EDA1 on the dataset, making additional insights available. - Unmaterialized data: These datasets are marked with
Dynamic
badge. This is data that was added using a data connection and is still stored in the data source. If you did not choose to run EDA1 on a sample upon import, fewer insights are available.
Metadata info¶
On the Info tab, you can view a high-level summary of the dataset, add identifying information, and view impact analysis.

Element | Description | |
---|---|---|
1 | Desciptive information | Update the name and description, or add tags to use for search. |
2 | Dataset information | Displays the number of rows and features display on the right, along with other details. |
3 | Run detection | Run personal data detection to identify, and if detected, remove, personal data from the dataset. |
4 | SQL Query | SQL query used to create dataset. |
5 | Renew snapshot | Add a scheduled snapshot. |
6 | Impact analysis | View how other DataRobot entities are related to—or dependent on—the current asset. |
Personal data detection¶
In some regulated and specific use cases, the use of personal data as a feature in a model is forbidden. DataRobot automates the detection of specific types of personal data to provide a layer of protection against the inadvertent inclusion of this information in a dataset and prevent its usage at modeling and prediction time.
After a dataset is ingested through the Data Registry, you have the option to check each feature for the presence of personal data. The result is a process that checks every cell in a dataset against patterns that DataRobot has developed for identifying this type of information. If found, a warning message is displayed, informing you of the type of personal data detected for each feature and providing sample values to help you make an informed decision on how to move forward. Additionally, DataRobot creates a new feature list—the equivalent of Informative Features but with all features containing any personal data removed. The new list is named Informative Features - Personal Data Removed.
Warning
There is no guarantee that this tool has identified all instances of personal data. It is intended to supplement your own personal data detection controls.
DataRobot currently supports detection of the following fields:
- Email address
- IPv4 address
- US telephone number
- Social security number
To run personal data detection on a dataset in the Data Registry, go to the Info page click Run Detection.

-
If no personal data is detected in the dataset, a success message displays.
-
If DataRobot detects personal data in the dataset, a warning message displays. Click Details to view more information about the personal data detected; click Dismiss to remove the warning and prevent it from being shown again. Warnings are also highlighted by column on the Profile tab.
Impact analysis¶
Impact analysis shows how other entities in the application are related to—or dependent on—the current asset. This is useful for a number of reasons, allowing you to:
- View how popular an item is based on the number of projects in which it is used.
- Understand which other entities might be affected if you were to makes changes or deletions.
- Gain understanding on how the entity is used.
To view Impact analysis, scroll down to the bottom of the Info tab. Click on a tile for summary details and then click on the associated button, Open Use Case in the below example, for specific details.

If you do not have permission to access an asset, you can view an entry that represents the asset but the entry does not disclose any additional information.
All of the following associations are reported (with frequency values) as applicable:
- Projects
- Prediction datasets
- Feature Discovery configurations
- Time series calendars
- Spark SQL queries
- External model packages
- Deployment retraining
This functionality is also available from the Version History tab for individual dataset versions.
Profile¶
The Profile tab allows you to preview dataset column names and row data. It can be useful for finding or verifying column names.

Info tab vs. Profile tab
The Info tab displays the data's total row count, feature count, and size. The Profile tab only displays a preview of the data based on a 1MB raw sample, and the feature types and details are based on a 500MB sample. Meaning the row count observed on the Profile tab may not match that displayed in the Info tab.
Note that the preview is a random sample of up to 1MB of the data and may be ordered differently from the original data. To see the complete, original data, use the Download Dataset option.
To view details for a particular feature, scroll to it in the display and click.

Feature lists¶
You can create new lists and feature transformations for features of any dataset in the Data Registry. To work with the tools, select the dataset in the Data Registry and Feature Lists in the left panel.
Note
To create feature lists, you must have Owner or Editor access to the dataset.
The Feature List tab also provides access to a tool for creating variable type feature transformations. While DataRobot bases variable type assignments on the values seen during EDA, there are times when you may need to change the type. Refer to feature transformations documentation for complete details.

Element | Description | |
---|---|---|
1 | Feature list dropdown | View a list of DataRobot-generated or custom feature lists. |
2 | Rename / Delete | Rename or Delete the custom feature list selected in the feature list dropdown. You cannot make any changes to DataRobot default feature lists. |
3 | Search | Search for a specific feature. |
4 | + Create new feature list from selection | Create a new feature list from the features that are currently selected. |
To create a feature list:
-
Use the checkboxes to the left of feature names to select a set of features.
-
Click the Create new feature list from selection link, which becomes active after you select the first feature.
-
In the resulting dialog, provide a name for the new list and click Submit. The new list becomes available in the dropdown.
Version history¶
The Version history tab lists all versions of a selected asset.

Element | Description | |
---|---|---|
1 | + Schedule dataset refresh | Add a scheduled snapshot. |
2 | Dataset version information | Displays the number of rows and features display on the right, along with other details for the individual dataset version. |
3 | Snapshot status | The snapshot status of the dataset version—green if successful, red if failed, gray if the original version did not have a snapshot. |
4 | Actions menu | Allows you to download or delete the dataset version. |
Renew snapshot¶
Availability information
For Self-Managed AI Platform installations, the Model Management Service must also be installed.
To ensure that a dataset is always in sync with the data source, if desired, DataRobot provides an automated, scheduled refresh mechanism. Through the Data Registry, users with dataset access above the consumer level can schedule snapshots at daily, weekly, monthly, and annual intervals. You can refresh any data asset type (JDBC, Spark, and URL) except for files.
Schedule refresh tasks¶
You can schedule multiple refresh tasks; limits are applied to datasets and to users independently.
To schedule snapshots for a dataset:
-
From the Data Registry, select the asset for which you want to schedule one or more refresh tasks.
-
Click the Schedule refresh link to expand the scheduler.
-
If the asset source is JDBC a login dialog results. Select the account credentials associated with the asset. DataRobot uses these credentials each time it runs the scheduled task. Once credentials are accepted (or if they were not required), the scheduler opens:
-
Complete the fields to set your task:
Element Description 1 Name Enter a name for the refresh job (or leave the default). 2 Calendar picker Sets the basis for the interval setting. 3 Interval Based on the calendar setting, the interval dropdown sets the frequency to daily, weekly, monthly, or annually. The time on the selected day is always set to the timestamp when the job was scheduled. 4 Summary Provides a summary of the selected scheduled task, including the interval and whether it is active or paused, supplied by DataRobot and updated with any changes to the job. -
Click Save to schedule a refresh for the asset. DataRobot reports the last execution status under the scheduled job name.
Use the calendar picker¶
Use the calendar picker to select a date that will serve as the basis of the day-of-week, monthly date, or day of year for the refresh.

Refreshes will start on or after (depending on the time set) the specific date. For example, if January 27 is the date selected, refreshes will begin:
- Daily at timestamp, either that day or the next day (January 27).
- Weekly on the set day (every Monday at timestamp).
- Monthly on that date of month (the 27th of each month at timestamp).
- Annually on that date (every January 27 at timestamp).

Click in the time picker. Use the arrows to change the time, setting the timestamp to the local time at which you want the snapshot to refresh. Click on the date to return to the full calendar view:

Work with scheduled tasks¶
Once scheduled, you can modify the task in a variety of ways. Use the Actions menu associated with the task to access the options.

Option | Description |
---|---|
Pause job | Pauses the scheduled task indefinitely. When paused, the "Scheduled" label changes to "Paused" and the menu item changes to "Resume job". Use this action to re-enable the scheduled task. Paused jobs do not count against the task limits. |
Edit | Retrieves the scheduler interface, allowing you to change any aspect of the task configuration. |
Manage credentials | Opens the credentials selection modal, allowing you to change the credentials associated with the dataset. |
Delete | Deletes the scheduled task. |
Refresh limit settings¶
The following table lists the defaults and maximums for refresh-related activities.
Availability information
The default listed in the table is for the managed AI Platform. For Self-Managed AI Platform installations, consider the maximum setting as the default.
Parameter | Description | Default | Maximum |
---|---|---|---|
Enabled dataset refresh jobs for a user | The total number of refresh jobs a user can have across all Data Registry datasets. | 100 | 100 |
Enabled dataset refresh jobs for a dataset | The total number of refresh jobs that can exist for a specific dataset for all users. | 5 | 100 |
Stored snapshots until a dataset refresh job is automatically disabled | The total number of stored snapshots that can exist for a specific dataset until the dataset refresh job is automatically disabled. | 100 | 1000 |
Comments¶
The Comments tab allows you to add comments to—even host a discussion around—any asset in the Data Registry that you have access to. With comments, you can:
- Tag other users in a comment; DataRobot will then send them an email notification.
- Edit or delete any comment you have added (you cannot edit or delete other users' comments).
