Set up automatic retraining (Continuous AI)¶
Automatic retraining for deployments is off by default. Contact your DataRobot representative or administrator for information on enabling the feature for DataRobot MLOps.
To maintain model performance after deployment without extensive manual work, DataRobot provides an automatic retraining capability for deployments. Upon providing a retraining dataset registered in the AI Catalog, you can define up to five retraining policies on each deployment, each consisting of a trigger, a modeling strategy, modeling settings, and a replacement action. When triggered, retraining will produce a new model based on these settings and notify you to consider promoting it.
Set up retraining for a deployment¶
To modify retraining settings for a deployment:
Click Deployments and select a deployment.
Navigate to the Settings > Challengers and Retraining tab.
Element Description Replay Challengers Schedule Enable Automatically replay challengers to set a recurring schedule for retraining on stored predictions. Retraining user For resource monitoring, retraining policies must be run as a user account. Select a retraining user who has Owner access for the deployment. Prediction environment Set the default prediction environment for scoring challenger models. Retraining data Specify a retraining dataset for all retraining profiles. Drag or browse for a local file or select a dataset from the AI Catalog. Manage Retraining Policies Click + Add Retraining Policy to specify a policy for retraining. Specify a retraining trigger, a model selection strategy, modeling settings, and a replacement action.
Editing retraining settings requires Owner permissions for the deployment. Those with User permissions can view the retraining settings for the deployment.
Complete the settings by following the procedures below.
Select a retraining user¶
When executed, scheduled retraining policies use the permissions and resources of an identified user (manually triggered policies use the resources of the user who triggers them.) The user needs the following:
- For the retraining data, permission to use data and create snapshots.
- Owner permissions for the deployment.
Modeling workers are required to train the models requested by the retraining policy. Workers are drawn from the retraining user's pool, and each retraining policy requests 50% of the retraining user's total number of workers. For example, if the user has a maximum of four modeling workers and retraining policy A is triggered, it runs with two workers. If retraining policy B is triggered, it also runs with two workers. If policies A and B are running and policy C is triggered, it shares workers with the other two policies running.
Note that interactive user modeling requests do not take priority over retraining runs. If a user's workers are applied to retraining, and the user initiate a new modeling run (manual or Autopilot), it shares workers with the retraining runs. For this reason, DataRobot recommends creating a user with a capped number of workers and designating this user for retraining jobs.
Choose a prediction environment¶
Challenger analysis requires replaying predictions that were initially made with the champion model against the challenger models. DataRobot uses a defined schedule and prediction environment for replaying predictions. When a new challenger is added as a result of retraining, it uses the assigned prediction environment to generate predictions from the replayed requests. It is possible to later change the prediction environment any given challenger is using from the Challengers tab.
While they are acting as challengers, models can only be deployed to DataRobot prediction environments. However, the champion model can use a different prediction environment from the challengers—either a DataRobot environment (for example, one marked for "Production" usage to avoid resource contention) or a remote environment (for example, AWS, OpenShift, or GCP). If a model is promoted from challenger to champion, it will likely use the prediction environment of the former champion.
Provide retraining data¶
All retraining policies on a deployment refer to the same AI Catalog dataset. You can register the dataset by navigating to the Settings > Data tab of the deployment and adding it to the Learning section. Alternatively, you can add training data directly from the Challengers and Retraining tab.
When a retraining policy triggers, DataRobot uses the latest version of the dataset (for uploaded AI Catalog items) or creates and uses a new snapshot from the underlying data source (for catalog items using data connections or URLs). For example, if the catalog item uses a Spark SQL query, when the retraining policy triggers, it executes that query and uses the resulting rows as an input to the modeling settings (including partitioning).
For AI Catalog items with underlying data connections, if the catalog item already has the maximum number of snapshots (100), the retraining policy will delete the oldest snapshot before taking a new one.
Set up retraining policies¶
On the Settings > Challengers and Retraining tab for a deployment, click + Add Retraining Policy. The Add Retraining Policy page displays.
Set a retraining trigger.
Configure how DataRobot selects a model from the new Autopilot project.
Set up a replacement strategy by selecting a model action.
Set up a modeling strategy by selecting settings for the new Autopilot project.
Click Save policy above the policy settings.
Retraining policies can be triggered manually or in response to three types of conditions:
Automatic schedule: Pick a time for the retraining policy to trigger automatically. Choose from increments ranging from every three months to every day.
Drift status: Initiates retraining when the deployment's data drift status declines to the level(s) you select.
Accuracy status: Triggers when the deployment's accuracy status changes from a better status to the levels you select (green to yellow, yellow to red, etc.).
Note that data drift and accuracy triggers are based on the notifications for the metrics configured in the Settings > Monitoring tab.
Once initiated, a retraining policy cannot be triggered again until it completes. For example, if a retraining policy is set to run every hour but takes more than an hour to complete, it will complete the first run rather than start over or queue with the second scheduled trigger. Only one trigger condition can be chosen for each retraining policy.
Choose a modeling strategy for the retraining policy. The strategy controls how DataRobot builds the new model on the updated data.
Use same blueprint as champion at time of retraining: Fits the same blueprint as the champion model at the time of triggering on the new data snapshot. Select one of the following options:
Use current hyperparameters: Use the same hyperparameters and blueprint as the champion model. Uses the champion's hyperparameter search and strategy for each task in the blueprint. Note that if you select this option, the champion model's feature list is used for retraining. The Informative Features list cannot be used.
Automatically tune hyperparameters: Use the same blueprint but optimize the hyperparameters for retraining.
Use best Autopilot model (recommended): Run Autopilot on the new data snapshot and use the resulting recommended model. Choose from Datarobot's three modeling modes: Quick, Autopilot, and Comprehensive.
If selected, you can also toggle additional Autopilot options:
- Only include blueprints that support Scoring Code
- Create blenders from top-performing Models
- Run Autopilot on a feature list with target leakage removed
- Only include models that support SHAP values
Apply one of three actions for each policy.
The model action determines what happens to the model produced by a successful retraining policy run. In all scenarios, deployment owners are notified of the new model's creation and the new model is added as a model package to the Model Registry.
Add new model as a challenger model: If there is space in the deployment's five challenger models slots, this action—which is the default—adds the new model as a challenger model. It replaces any model that was previously added by this policy. If no slots are available, and no challenger was previously added by this policy, the model will only be saved to the Model Registry. Additionally, the retraining policy run fails because the model could not be added as a challenger.
Initiate model replacement with new model: Suitable for high-frequency (e.g., daily) replacement scenarios, this option automatically requests a model replacement as soon as the new model is created. This replacement is subject to defined approval policies and their applicability to the given deployment, based on its owners and importance level. Depending on that approval policy, reviewers may need to approve the replacement manually before it occurs.
Save model: In this case, no action is taken with the model other than adding it to the Model Registry.
The modeling strategy for retraining defines how DataRobot should set up the new autopilot project. Define the features, optimization metric, partitioning strategies, sampling strategies, weights, and other advanced settings that instruct DataRobot on how to build models for a given problem.
You can either reuse the same features as the champion model uses (when the trigger initiates) or allow DataRobot to identify the informative features from the new data.
By default, DataRobot reuses the same settings as the champion model (at the time of the trigger initiating). Alternatively, you can define new partitioning settings, choosing from a subset of options available in the project Start screen.
Manage retraining policies¶
After creating a retraining policy, you can start it manually, cancel it, or update it, as explained in the table below.
|Retraining policy row||Click on a retraining policy row to expand it. Once expanded, view or edit the retraining settings.|
|Run||Click the run button () to start a policy manually. Alternatively, edit the policy by clicking the policy row and scheduling a run using the retraining trigger.|
|Remove||Click the remove button () to delete a policy. Click Remove in the confirmation window.|
|Cancel||Click the cancel button () to cancel a policy that is in progress or scheduled to run. You can't cancel a policy if it has finished successfully, reached the "Creating challenger" or "Replacing model" step, failed, or has already been canceled.|
You can view all previous runs of a training policy, successful or failed. Each run includes a start time, end time, duration, and—if the run succeeded—links to the resulting project and model package. While only the DataRobot-recommended model for each project is added automatically to the deployment, you may want to explore the project's Leaderboard to find or build alternative models.
Policies cannot be deleted or interrupted while they are running. If the retraining worker and organization have sufficient workers, multiple policies on the same deployment can be running at once.
The Challengers and Retraining tab allows for simple performance comparison, meaning retraining strategies can be evaluated empirically and customized for different use cases. You may benefit from initial experimentation, using various time frames for the "same-blueprint" and Autopilot strategies. For example, consider running "same-blueprint" retraining strategies using both a nightly and a weekly pattern and comparing the results.
Typical strategies for implementing automatic retraining policies in a deployment include:
- High-frequency automatic schedule: Frequently (e.g., daily) retrain the currently deployed blueprint on the newest data to stabilize the deployed model selection.
- Low-frequency automatic schedule: Periodically (e.g., weekly, monthly) run Autopilot to explore alternative modeling techniques and potentially optimize performance. You can restrict this process to only Scoring Code-supported models if that is how you deploy. See the Include only blueprints with Scoring Code support advanced option for more information.
- Drift status trigger: Monitor data drift and trigger Autopilot to prepare an alternative model when the champion model has shown data drift due to changing situations.
- Accuracy status trigger: Monitor accuracy drift and trigger Autopilot to search for a better-performing model after the champion model has shown accuracy decay. This strategy is most effective for use cases with fast access to actuals.
Only binary, multiclass, and regression target types support retraining. The Challengers and Retraining tab doesn't appear when a deployment's champion has a multilabel target type.
Unsupported models and projects¶
Retraining is not supported for the following DataRobot models and project types. In those cases, the Challengers and Retraining tab doesn't appear when a deployment's champion uses any of the listed functionality:
- Combined/segmented models
- Feature Discovery models
- Unsupervised learning projects (including anomaly detection and clustering)
- Unstructured custom inference models
Partially supported models¶
The following model types partially support retraining. For each partially supported model, only the supported (✔) options are available in retraining policies on the Challengers and Retraining tab:
Only some retraining policy options are model-dependent. If the support matrix below doesn't include a model type, all options of a retraining policy are available for configuration.
|Model type||Same blueprint as champion||Champion model's feature list||Project options from champion model||Custom project options|
Retraining for time series¶
Time series deployments support retraining, but there are limitations when configuring policies due to the time series feature derivation process. This process generates features such as lags and moving averages and creates a new modeling dataset.
Time series model selection¶
Same blueprint as champion: The retraining policy uses the same engineered features as the champion model's blueprint. The search for newly derived features does not occur because it could potentially generate features that are not captured in the champion's blueprint.
Autopilot: When using Autopilot instead of the same blueprint, the time series feature derivation process does occur. However, Comprehensive Autopilot mode is not supported. Additionally, time series Autopilot does not support the options to only include Scoring Code blueprints and models with SHAP value support.
Time series modeling strategy¶
Same blueprint as champion: When creating a "same-blueprint" retraining policy for a time series deployment, you must use the champion model's feature list and advanced modeling options. The only option that you can override is the calendar used because, for example, a new holiday or event may be included in an updated calendar that you want to account for during retraining.
Autopilot: When creating an Autopilot retraining policy for a time series deployment, you must use the informative features modeling strategy. This strategy allows Autopilot to derive a new set of feature lists based on the informative features generated by new or different data. You cannot use the model's original feature list because time series Autopilot uses a feature extraction and reduction process by default. You can, however, override additional modeling options from the champion's project:
|Treat as exponential trend||Apply a log-transformation to the target feature.|
|Exponentially weighted moving average (EWMA)||Set a smoothing factor for EWMA.|
|Apply differencing||Set DataRobot to apply differencing to make the target stationary prior to modeling.|
|Add calendar||Upload, add from the catalog, or generate an event file that specifies dates or events that require additional attention.|
For time-aware retraining, if you choose to reuse options from the champion model or override the champion model's project options, consider the following:
- If the champion's project used the holdout start date and end date, the retraining project does not use these settings but instead uses holdout duration, the difference between these two dates.
- If the champion project used the holdout duration with either the holdout start date or end date, the holdout start/end date is dropped, and holdout duration is used in the retraining project. A new holdout start date is computed (the end of the retraining dataset minus the holdout duration).
Your customizations to backtests are not retained; however, the number of backtests is retained. At retraining time, the training start and end dates will likely differ from the champion's start and end dates. The data used for retraining might have shifted so that it no longer contains all of the data from a specific backtest on the champion model.