Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Set up automatic retraining (Continuous AI)

Availability information

Automatic retraining for deployments is off by default. Contact your DataRobot representative or administrator for information on enabling the feature for DataRobot MLOps.

To maintain model performance after deployment without extensive manual work, DataRobot provides an automatic retraining capability for deployments. Upon providing a retraining dataset registered in the AI Catalog, you can define up to five retraining policies on each deployment, each consisting of a trigger, a modeling strategy, modeling settings, and a replacement action. When triggered, retraining will produce a new model based on these settings and notify you to consider promoting it.

Set up retraining for a deployment

To modify retraining settings for a deployment:

  1. Click Deployments and select a deployment.

  2. Navigate to the Settings > Challengers and Retraining tab.

    Element Description
    Replay Challengers Schedule Enable Automatically replay challengers to set a recurring schedule for retraining on stored predictions.
    Retraining user For resource monitoring, retraining policies must be run as a user account. Select a retraining user who has Owner access for the deployment.
    Prediction environment Set the default prediction environment for scoring challenger models.
    Retraining data Specify a retraining dataset for all retraining profiles. Drag or browse for a local file or select a dataset from the AI Catalog.
    Manage Retraining Policies Click + Add Retraining Policy to specify a policy for retraining. Specify a retraining trigger, a model selection strategy, modeling settings, and a replacement action.

    Note

    Editing retraining settings requires Owner permissions for the deployment. Those with User permissions can view the retraining settings for the deployment.

  3. Complete the settings by following the procedures below.

Select a retraining user

When executed, scheduled retraining policies use the permissions and resources of an identified user (manually triggered policies use the resources of the user who triggers them.) The user needs the following:

  • For the retraining data, permission to use data and create snapshots.
  • Owner permissions for the deployment.

Modeling workers are required to train the models requested by the retraining policy. Workers are drawn from the retraining user’s pool, and each retraining policy requests 50% of the retraining user’s total number of workers. For example, if the user has a maximum of four modeling workers, and retraining policy A is triggered, it runs with two workers. If retraining policy B is triggered, it also runs with two workers. If policies A and B are running and policy C is triggered, it shares workers with the other two policies running.

Note that interactive user modeling requests do not take priority over retraining runs. If a user’s workers are applied to retraining, and the user initiate a new modeling run (manual or Autopilot), it shares workers with the retraining runs. For this reason, DataRobot recommends creating a user with a capped number of workers and designating this user for retraining jobs.

Choose a prediction environment

Challenger analysis requires replaying predictions that were initially made with the champion model against the challenger models. DataRobot uses a defined schedule and prediction environment for replaying predictions. When a new challenger is added as a result of retraining, it uses the assigned prediction environment to generate predictions from the replayed requests. It is possible to later change the prediction environment any given challenger is using from the Challengers tab.

While they are acting as challengers, models can only be deployed to DataRobot prediction environments. However, the champion model can use a different prediction environment from the challengers—either a DataRobot environment (for example, one marked for “Production” usage, to avoid resource contention) or a remote environment (for example, AWS, OpenShift, or GCP). If a model is promoted from challenger to champion, it will likely use the prediction environment of the former champion.

Provide retraining data

All retraining policies on a deployment refer to the same AI Catalog dataset. You can register the dataset by navigating to the Settings > Data tab of the deployment and adding it to the Learning section. Alternatively, you can add training data directly from the Challengers and Retraining tab.

When a retraining policy triggers, DataRobot uses the latest version of the dataset (for uploaded AI Catalog items) or creates and uses a new snapshot from the underlying data source (for catalog items using data connections or URLs). For example, if the catalog item uses a Spark SQL query, when the retraining policy triggers it executes that query, then uses the resulting rows as an input to the modeling settings (including partitioning).

For AI Catalog items with underlying data connections, if the catalog item already has the maximum number of snapshots (100), the retraining policy will delete the oldest snapshot before taking a new one.

Set up retraining policies

  1. On the Settings > Challengers and Retraining tab for a deployment, click + Add Retraining Policy. The Add Retraining Policy page displays.

  2. Set a retraining trigger.

  3. Configure how DataRobot selects a model from the new autopilot project.

  4. Set up a replacement strategy by selecting a model action.

  5. Set up a modeling strategy by selecting settings for the new Autopilot project.

  6. Click Save policy above the policy settings.

Triggers

Retraining policies can be triggered manually or in response to three types of conditions:

  • Automatic schedule: Pick a time for the retraining policy to trigger automatically. Choose from increments ranging from every three months to every day.

  • Drift status: Initiates retraining when the deployment's data drift status declines to the level(s) you select.

  • Accuracy status: Triggers when the deployment's accuracy status changes from a better status to the levels you select (green to yellow, yellow to red, etc.).

Note that data drift and accuracy triggers are based on the notifications for the metrics configured in the Settings > Monitoring tab.

Once initiated, a retraining policy cannot be triggered again until it completes. For example, if a retraining policy is set to run every hour, but takes more than an hour to complete, it will complete the first run, rather than start over or queue with the second scheduled trigger. Only one trigger condition can be chosen for each retraining policy.

Model selection

Choose a modeling strategy for the retraining policy. The strategy controls how DataRobot builds the new model on the updated data.

  • Use same blueprint as champion at time of retraining: Fits the same blueprint as the champion model at the time of triggering on the new data snapshot. Select one of the following options:

    • Use current hyperparameters: Use the same hyperparameters and blueprint as the champion model. Uses the champion's hyperparameter search and strategy for each task in the blueprint. Note that if you select this option, the champion model's feature list is used for retraining. The Informative Features list cannot be used.

    • Automatically tune hyperparameters: Use the same blueprint but optimize the hyperparameters for retraining.

  • Use best autopilot model (recommended): Run autopilot on the new data snapshot and use the resulting recommended model. Choose from Datarobot's three modeling modes: Quick, Autopilot, and Comprehensive.

If selected, you can also toggle additional Autopilot options:

Model action

Apply one of three actions for each policy.

The model action determines what happens to the model produced by a successful retraining policy run. In all scenarios, deployment owners are notified of the new model’s creation and the new model is added as a model package to the Model Registry.

  • Add new model as a challenger model: If there is space in the deployment's five challenger models slots, this action—which is the default—adds the new model as a challenger model. It replaces any model that was previously added by this policy. If no slots are available, and no challenger was previously added by this policy, the model will only be saved to the Model Registry. Additionally, the retraining policy run fails because the model could not be added as a challenger.

  • Initiate model replacement with new model: Suitable for high-frequency (e.g., daily) replacement scenarios, this option automatically requests a model replacement as soon as the new model is created. This replacement is subject to defined approval policies and their applicability to the given deployment, based on its owners and importance level. Depending on that approval policy, reviewers may need to manually approve the replacement before it occurs.

  • Save model: In this case, no action is taken with the model other than adding it to the Model Registry.

Modeling strategy

The modeling strategy for retraining defines how DataRobot should set up the new autopilot project. Define the features, optimization metric, partitioning strategies, sampling strategies, weights, and other advanced settings that instruct DataRobot on how to build models for a given problem.

You can either reuse the same features as the champion model uses (when the trigger initiates) or allow DataRobot to identify the informative features from the new data.

By default, DataRobot reuses the same settings as the champion model (at time of the trigger initiating). Alternatively, you can define new partitioning settings, choosing from a subset of options available in the project Start screen.

Manage retraining policies

After creating a retraining policy, you can start it manually, cancel it, or update it, as explained in the table below.

Element Definition
Retraining policy row Click on a retraining policy row to expand it. Once expanded, view or edit the retraining settings.
Run Click the run button () to start a policy manually. Alternatively, edit the policy by clicking the policy row and scheduling a run using the retraining trigger.
Remove Click the remove button () to delete a policy. Click Remove in the confirmation window.
Cancel Click the cancel button () to cancel a policy that is in progress or is scheduled to be run. You cannot cancel a run if it has finished successfully, has failed, or is already cancelled.

Retraining history

You can view all previous runs of a training policy, successful or failed. Each run includes a start time, end time, duration, and—if the run succeeded—links to the resulting project and model package. While only the DataRobot-recommended model for each project is added automatically to the deployment, you may want to explore the project’s Leaderboard to find or build alternative models.

Note that policies cannot be deleted or interrupted while they are running. If the retraining worker and organization have sufficient workers, multiple policies on the same deployment can be running at once.

Retraining for time series

Time series deployments support retraining, but there are limitations when configuring policies due to Feature Extraction and Reduction (FEAR): a feature generation process unique to time series projects that extracts new features and then reduces the set of extracted features. FEAR generates features such as lags and moving averages for the data.

Time series model selection

Same blueprint as champion: The retraining policy uses the same engineered features as the champion model's blueprint. The search for new FEAR features does not occur because it could potentially generate new features that are not captured in the champion's blueprint.

Autopilot: When using Autopilot instead of the same blueprint, FEAR does occur and new features can be generated. However, Comprehensive Autopilot mode is not supported. Additionally, time series Autopilot does not support the options to only include Scoring Code blueprints and models with SHAP value support.

Time series modeling strategy

Same blueprint as champion: The policy uses the same feature list and advanced modeling options as the champion model. The only option that can be overridden is the calendar used because, for example, a new holiday or event may be included in an updated calendar that you want to account for during retraining.

Autopilot: For Autopilot, you must use the Informative Features list that generates as a result of FEAR—you cannot use the same feature list as the model because Autopilot accounts for this process. You can, however, override additional modeling options from the champion's project:

Option Description
Treat as exponential trend Apply a log-transformation to the target feature.
Exponentially weighted moving average Set a smoothing factor for EWMA.
Apply differencing Set DataRobot to apply differencing to make the target stationary prior to modeling.
Add calendar Upload, add from the catalog, or generate an event file that specifies dates or events that require additional attention.

Retraining Considerations

Availability information

Automatic retraining is only available to customers licensed for AutoML and MLOps. To try MLOps, contact your DataRobot representative.

  • Automatic retraining is not supported for unsupervised models.
  • Automatic retraining is not supported for deployed models that make use of Feature Discovery.
  • Retraining cannot occur more than three times over 24 hours per deployment.

Typical patterns for implementing retraining policies deployments may include:

  • High-frequency (e.g., weekly) using the same blueprint. This makes use of the freshest data while keeping the modeling choices stable.
  • Low-frequency (e.g., weekly, monthly) based on a new Autopilot run. This explores alternative modeling techniques to optimize performance. You can restrict to only Scoring Code-supported models if that is how you deploy. Alternatively, you may prefer to have a maximum accuracy model available to determine any performance disparity due to deployment considerations.
  • At least one strategy triggered on drift. This quickly prepares an alternative model to evaluate in response to changing situations.
  • For use cases with fast access to actuals, an Autopilot strategy triggered on accuracy. This searches for an alternative model that may provide superior performance after the champion model has shown decay.

You may benefit from initial experimentation with different time frames against the same blueprint and Autopilot strategies. For example, consider running a same-blueprint strategy both every night and every week. The challenger framework allows a simple comparison of performance, meaning retraining strategies can be evaluated empirically and customized for different use cases and at different times.

Time series retraining considerations

  • If you choose to reuse options from the champion model or override the champion model's project options, consider the following:

    • If the champion's project used the holdout start date and holdout end date, the retraining project does not use these settings but instead uses holdout duration, the difference between these two dates.

    • If the champion project used the holdout duration with either the holdout start date or holdout end date, the holdout start/end date is dropped and holdout duration is used in the retraining project. A new holdout start date is computed (the end of the retraining dataset minus the holdout duration).

  • Customizations to backtests are not retained. At retraining time, the training start and end date will likely be different from the champion's—the data used for retraining might have shifted so that it no longer contains all of the data from a specific backtest on the champion model. Note that the number of backtests is retained.


Updated January 13, 2022
Back to top