Automatic retraining (Continuous AI)¶
Automatic retraining for deployments is off by default. Contact your DataRobot representative or administrator for information on enabling the feature for DataRobot MLOps.
Feature flag: Enable MMM deployment retraining
To maintain model performance after deployment without extensive manual work, DataRobot provides an automatic retraining capability for deployments. Upon providing a retraining dataset registered in the AI Catalog, you can define up to five retraining policies on each deployment, each consisting of a trigger, a modeling strategy, modeling settings, and a replacement action. When triggered, retraining will produce a new model based on these settings and notify you to consider promoting it.
Deployment-level retraining settings¶
To modify retraining settings for a deployment, navigate to the Challengers and Retraining tab. Editing retraining settings requires Owner permissions for the deployment. Those with User permissions can view the retraining settings for the deployment.
Select a retraining user¶
When executed, scheduled retraining policies use the permissions and resources of an identified user (manually triggered policies use the resources of the user who triggers them.) The user needs the following:
- For the retraining data, permission to use data and create snapshots.
- Owner permissions for the deployment.
Modeling workers are required to train the models requested by the retraining policy. Workers are drawn from the retraining user’s pool, and each retraining policy requests 50% of the retraining user’s total number of workers. For example, if the user has a maximum of four modeling workers, and retraining policy A is triggered, it runs with two workers. If retraining policy B is triggered, it also runs with two workers. If policies A and B are running and policy C is triggered, it shares workers with the other two policies running.
Note that interactive user modeling requests do not take priority over retraining runs. If a user’s workers are applied to retraining, and the user initiate a new modeling run (manual or Autopilot), it shares workers with the retraining runs. For this reason, DataRobot recommends creating a user with a capped number of workers and designating this user for retraining jobs.
Choose a prediction environment¶
Challenger analysis requires replaying predictions that were initially made with the champion model against the challenger models. DataRobot uses a defined schedule and prediction environment for replaying predictions. When a new challenger is added as a result of retraining, it uses the assigned prediction environment to generate predictions from the replayed requests. It is possible to later change the prediction environment any given challenger is using from the Challengers tab.
While they are acting as challengers, models can only be deployed to DataRobot prediction environments. However, the champion model can use a different prediction environment from the challengers—either a DataRobot environment (for example, one marked for “Production” usage, to avoid resource contention) or a remote environment (for example, AWS, OpenShift, or GCP). If a model is promoted from challenger to champion, it will likely use a the prediction environment of the former champion.
Provide retraining data¶
All retraining policies on a deployment refer to the same AI Catalog dataset. You can register the dataset by navigating to the Settings > Data tab of the deployment and adding it to the Learning section. Alternatively, you can add training data directly from the Challengers and Retraining tab.
When a retraining policy triggers, DataRobot uses the latest version of the dataset (for uploaded AI Catalog items) or creates and uses a new snapshot from the underlying data source (for catalog items using data connections or URLs). For example, if the catalog item uses a Spark SQL query, when the retraining policy triggers it executes that query, then uses the resulting rows as an input to the modeling settings (including partitioning).
For AI Catalog items with underlying data connections, if the catalog item already has the maximum number of snapshots (100), the retraining policy will delete the oldest snapshot before taking a new one.
Retraining policies can be triggered manually or in response to three types of conditions:
Automatic schedule: Pick a time for the retraining policy to trigger automatically. Choose from increments ranging from every three months to every day.
Drift status: Initiates retraining when the deployment's data drift status declines to the level(s) you select.
Accuracy status: Triggers when the deployment's accuracy status changes from a better status to the levels you select (green to yellow, yellow to red, etc.).
Note that data drift and accuracy triggers are based on the notifications for the metrics configured in the Settings > Monitoring tab.
Once initiated, a retraining policy cannot be triggered again until it completes. For example, if a retraining policy is set to run every hour, but takes more than an hour to complete, it will complete the first run, rather than start over or queue with the second scheduled trigger. Only one trigger condition can be chosen for each retraining policy.
Choose a modeling strategy for the retraining policy. The strategy controls how DataRobot builds the new model on the updated data.
Same blueprint as a champion: Fits the same blueprint as the champion model at the time of triggering on the new data snapshot. The champion's hyperparameter search and strategy are used for each task in the blueprint.
Autopilot (recommended): Run autopilot on the new data snapshot and use the resulting recommended model. Choose from Datarobot's three modeling modes: Quick, Autopilot, and Comprehensive.
If selected, you can also toggle additional Autopilot options:
- Only include blueprints that support Scoring Code
- Create blenders from top-performing Models
- Run Autopilot on a feature list with target leakage removed
- Only include models that support SHAP values
The modeling strategy for retraining defines how DataRobot should set up the new autopilot project. Define the features, optimization metric, partitioning strategies, sampling strategies, weights, and other advanced settings that instruct DataRobot on how to build models for a given problem.
You can either reuse the same features as the champion model uses (when the trigger initiates) or allow DataRobot to identify the informative features from the new data.
By default, DataRobot reuses the same settings as the champion model (at time of the trigger initiating). Alternatively, you can define new partitioning settings, choosing from a subset of options available in the project Start screen.
Apply one of three actions for each policy. These determine what happens to the model produced by a successful retraining policy run. In all scenarios, deployment owners are notified of the new model’s creation and the new model is added as a model package to the Model Registry.
Add new model as a challenger model: If there is space in the deployment's five challenger models slots, this action—which is the default—adds the new model as a challenger model. It replaces any model that was previously added by this policy. If no slots are available, and no challenger was previously added by this policy, the model will only be saved to the Model Registry. Additionally, the retraining policy run fails because the model could not be added as a challenger.
Initiate model replacement with new model: Suitable for high-frequency (e.g., daily) replacement scenarios, this option automatically requests a model replacement as soon as the new model is created. This replacement is subject to defined approval policies and their applicability to the given deployment, based on its owners and importance level. Depending on that approval policy, reviewers may need to manually approve the replacement before it occurs.
Save model: In this case, no action is taken with the model other than adding it to the Model Registry.
You can view all previous runs of a training policy, successful or failed. Each run includes a start time, end time, duration, and—if the run succeeded—links to the resulting project and model package. While only the DataRobot-recommended model for each project is added automatically to the deployment, you may want to explore the project’s Leaderboard to find or build alternative models.
Note that policies cannot be deleted or interrupted while they are running. If the retraining worker and organization have sufficient workers, multiple policies on the same deployment can be running at once.
Retraining for time series¶
Time series deployments support retraining, but there are limitations when configuring policies due to Feature Extraction and Reduction (FEAR): a feature generation process unique to time series projects that extracts new features and then reduces the set of extracted features. FEAR generates features such as lags and moving averages for the data.
Time series model selection¶
Same blueprint as a champion: The retraining policy uses the same engineered features as the champion model's blueprint. The search for new FEAR features does not occur because it could potentially generate new features that are not captured in the champion's blueprint.
Autopilot: When using Autopilot instead of the same blueprint, FEAR does occur and new features can be generated. However, Comprehensive Autopilot mode is not supported. Additionally, time series Autopilot does not support the options to only include Scoring Code blueprints and models with SHAP value support.
Time series modeling strategy¶
Same blueprint as a champion: The policy uses the same feature list and advanced modeling options as the champion model. The only option that can be overridden is the calendar used because, for example, a new holiday or event may be included in an updated calendar that you want to account for during retraining.
Autopilot: For Autopilot, you must use the Informative Features list that generates as a result of FEAR—you cannot use the same feature list as the model because Autopilot accounts for this process. You can, however, override additional modeling options from the champion's project:
|Treat as exponential trend||Apply a log-transformation to the target feature.|
|Exponentially weighted moving average||Set a smoothing factor for EWMA.|
|Apply differencing||Set DataRobot to apply differencing to make the target stationary prior to modeling.|
|Add calendar||Upload, add from the catalog, or generate an event file that specifies dates or events that require additional attention.|
Automatic retraining is only available to customers licensed for AutoML and MLOps. To try MLOps, contact your DataRobot representative.
During the public preview period, several restrictions apply to automatic retraining:
- Automatic retraining is not supported for unsupervised models, including anomaly detection models.
- Automatic retraining is not supported for deployed models that make use of Feature Discovery.
- The “same blueprint” strategy does not use the same (“frozen”) hyperparameters as the champion model. Rather, it uses the same blueprint and hyperparameter search space and strategy that the champion does.
- Retraining cannot occur more than three times over 24 hours per deployment.
Typical patterns for implementing retraining policies deployments may include:
- High-frequency (e.g., weekly) using the same blueprint. This makes use of the freshest data while keeping the modeling choices stable.
- Low-frequency (e.g., weekly, monthly) based on a new Autopilot run. This explores alternative modeling techniques to optimize performance. You can restrict to Scoring Code-supported only models if that is how you deploy. Alternatively, you may prefer to have a maximum accuracy model available to determine any performance disparity due to deployment considerations.
- At least one strategy triggered on drift. This quickly prepares an alternative model to evaluate in response to changing situations.
- For use cases with fast access to actuals, an Autopilot strategy triggered on accuracy. This searches for an alternative model that may provide superior performance after the champion model has shown decay.
You may benefit from initial experimentation with different time frames against the same blueprint and Autopilot strategies. For example, consider running a same-blueprint strategy both every night and every week. The challenger framework allows a simple comparison of performance, meaning retraining strategies can be evaluated empirically and customized for different use cases and at different times.