The AI Catalog must be enabled in order to schedule snapshot refreshes. For Self-Managed AI Platform installations, the Model Management Service must also be installed.
To ensure that a dataset is always in sync with the data source, if desired, DataRobot provides an automated, scheduled refresh mechanism. Through the AI Catalog, users with dataset access above the consumer level can schedule snapshots at daily, weekly, monthly, and annual intervals. You can refresh any data asset type (HDFS, JDBC, Spark, and URL) except for files.
Scheduled dataset refreshes should not be enabled if you do not have the stored credentials capability enabled (unless the data source in question does not require credentials, such as a URL or possibly HDFS).
Schedule refresh tasks¶
You can schedule multiple refresh tasks; limits are applied to datasets and to users independently.
To schedule snapshots for a dataset:
From the main catalog listing, select the asset for which you want to schedule one or more refresh tasks.
Click the Schedule refresh link to expand the scheduler.
If the asset source is JDBC or HDFS, a login dialog results. Select the account credentials associated with the asset. DataRobot uses these credentials each time it runs the scheduled task. Once credentials are accepted (or if they were not required), the scheduler opens:
Complete the fields to set your task:
Field Description Name (1) Enter a name for the refresh job (or leave the default). Calendar picker(2) Sets the basis for the interval setting. Interval (3) Based on the calendar setting, the interval dropdown sets the frequency to daily, weekly, monthly, or annually. The time on the selected day is always set to the timestamp when the job was scheduled. Summary (4) Provides a summary of the selected scheduled task, including the interval and whether it is active or paused, supplied by DataRobot and updated with any changes to the job.
Click Save to schedule a refresh for the asset. DataRobot reports the last execution status under the scheduled job name.
Use the calendar picker¶
Use the calendar picker to select a date that will serve as the basis of the day-of-week, monthly date, or day of year for the refresh.
Refreshes will start on or after (depending on the time set) the specific date. For example, if June 21 is the date selected, refreshes will begin:
- daily at timestamp, either that day or the next day (June 22)
- weekly on the set day (every Sunday at timestamp)
- monthly on that date of month (the 21st of each month at timestamp)
- Annually on that date (every June 21 at timestamp).
Click in the time picker. Use the arrows to change the time, setting the timestamp to the local time at which you want the snapshot to refresh. Click on the date to return to the full calendar view:
Work with scheduled tasks¶
Once scheduled, you can modify the task in a variety of ways. Use the menu associated with the task to access the options.
Pause job: Pauses the scheduled task indefinitely. When paused, the "Scheduled" label changes to "Paused" and the menu item changes to "Resume job". Use this action to re-enable the scheduled task. Paused jobs do not count against the task limits.
Edit: Retrieves the scheduler interface, allowing you to change any aspect of the task configuration.
Manage credentials: Opens the credentials selection modal, allowing you to change the credentials associated with the dataset.
Delete: Deletes the scheduled task.
Refresh limit settings¶
The following table lists the defaults and maximums for refresh-related activities.
For Self-Managed AI Platform installations, consider the maximum setting as the default.
|Enabled dataset refresh jobs for a user||10||100|
|Enabled dataset refresh jobs for a dataset||5||100|
|Stored snapshots until a dataset refresh job is automatically disabled||25||1000|