Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Use anomaly detection with time series

Anomaly detection is a method for detecting abnormalities in data, often used in cases where there are thousands of normal transactions and only a low percentage of outliers (for example, network analysis or cybersecurity). In this tutorial, you'll learn how to interpret models that were trained to use anomaly detection (an application of unsupervised learning) to detect time series anomalies. With unsupervised learning you do not specify a target. Instead, DataRobot applies anomaly detection, also referred to as outlier and novelty detection, to detect abnormalities in your dataset.

Takeaways

This tutorial explains:

  • How to select the best anomaly detection model.
  • How to interpret the selected model.
  • Ways to deploy the anomaly detection models.

Create an anomaly detection model

Using the anomaly detection workflow, select No target and then Anomalies to build an anomaly detection model.

Configure the project settings—feature derivation windows, backtests, calendars, and any other customizations. Set a modeling mode and click Start.

Select the best anomaly detection model

Once Autopilot completes, examine the Leaderboard. Without a target, traditional data science metrics cannot be calculated to estimate model performance so DataRobot instead uses the Synthetic AUC metric.

Upload an external test dataset

Synthetic AUC is a good basis for model selection if you don't have an external test dataset available. If you do have an external dataset available it is better to use that. This is because the anomalies that Synthetic AUC finds may be different than the actual anomalies in your dataset.

To use an external dataset, select a model on the Leaderboard and go to Predict > Make Predictions.

  1. Upload an external dataset.

  2. Once uploaded, click Forecast settings:

    And then Forecast Range Predictions. From there, enter the name of the "known anomalies column" (to generate scores) and click Compute predictions.

  3. Once scores are computed, return to the Leaderboard and use the menu to change the display so it shows the external test column.

  4. In the External test column, click Run to compute scores for the other blueprints. The Leaderboard reorders results to show values sorted by the actual (not synthetic) AUC.

Once the external tests are scored, click on any model to explore the visualizations.

Explore visualizations

While there are many visualizations to investigate, for anomaly-specific models some of the most important to consider are:

The following tabs are always useful for understanding your data and are described in detail in the full documentation:

  • ROC Curve tools help you understand how well the prediction distribution captures the model separation.
  • Feature Impact displays the relative impact of each feature—both original and derived—on the model.
  • Feature Effects show how changes to the value of each feature change model predictions in relation to the anomaly score.
  • Prediction Explanations help you understand why a model assigned a value to a specific observation.

Anomaly Over Time tab

The Evaluate > Anomaly Over Time chart helps you understand when anomalies occur across the timeline of your data. You can change the backtest being displayed to evaluate anomaly scores across specific validation periods. You can also use the chart from the Model Comparison tab, which is a good method for identifying two complementary models to blend, increasing the likelihood of capturing more potential issues.

Anomaly Assessment tab

The Anomaly Assessment chart plots data for the selected backtest and provides SHAP explanations for up to 500 anomalous points. It helps to identify which features are contributing to the anomaly score (via the SHAP values) and is useful for explaining high scores.

Make predictions

There are three mechanisms for making predictions with the selected anomaly detection model.

  1. Make Predictions tab
  2. Deploy tab
  3. Portable Prediction Server

Make Predictions tab

The Make Predictions tab is typically used for:

  • Testing from a simple Leaderboard interface.
  • Small (less than 1 B) prediction datasets.
  • Ad-hoc projects that don’t require frequent predictions.

Make Predictions with time series projects works slightly differently than for non time-series projects. For time series projects, Make Predictions requires specific criteria for the prediction dataset and applies forecast settings.

Deploy tab

Alternatively, you can create a deployment—a REST endpoint that manages prediction requests via the API. This method connects the model to a dedicated prediction server and creates a dedicated deployment object. Use the Deploy tab to create a deployment with the model:

  1. Click Prepare for deployment if the model has not already been prepared.
  2. Add deployment information and click Create deployment.

  3. View your deployment in the deployment inventory.

Portable Prediction Server (PPS)

You can deploy a model via Docker with DataRobot's Portable Prediction Server (PPS). The PPS is a DataRobot execution environment for DataRobot model packages (.mlpkg files) distributed as a self-contained Docker image. Using this method moves the model closer to production data and allows you to integrate into already existing pipelines and applications.

Learn more

  • DataRobot University's Time Series Anomaly Detection Lab (requires a DataRobot University subscription) to build and evaluate a time-aware unsupervised ML model to detect anomalies in a predictive maintenance dataset.

  • Towards Data Science, Anomaly Detection for Dummies, for an introduction to anomaly detection.

Documentation:


Updated September 20, 2022
Back to top