Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Time series modeling

Availability information

Contact your DataRobot representative for information on enabling automated time series (AutoTS) modeling.

Note

See the documented file requirements for information on file size and series limit considerations.

The following sections provide information on using the time series feature of DataRobot's forecasting (modeling to predict future values) and nowcasting (modeling to determine current values) functionality. What follows is a brief overview of time series modeling and then a detailed workflow.

See the sections on comparison with conventional supervised learning, common patterns of time series data, and how DataRobot creates the modeling dataset.

See this article for a more technical discussion of the general framework for developing time series models, including generating features and preprocessing the data as well as automating the process to apply advanced machine learning algorithms to almost any time series problem.

Time series overview

When working with time series data, ask yourself: How long do I want to look into the past and how far into the future do I want to predict? Once you determine those answers, you can configure DataRobot so that your time-sensitive data uses advanced DataRobot modeling techniques to create forecasts from your data.

DataRobot automatically creates and selects time series features in the modeling data. You can constrain the features (for example, minimum and maximum lags, etc.) by configuring the time series framework on the Start screen.

DataRobot then performs all analysis and modeling on the modeling dataset. Because time shifts, lags, and features have already been applied, DataRobot can use general machine learning algorithms to build models with the new modeling dataset.

In general, the time series model building process is as follows:

  1. Upload your raw data; DataRobot runs EDA1.
  2. Set window parameters, such as the feature derivation window and forecast window.
  3. DataRobot applies that framework to the dataset and creates a new modeling dataset with time series features.

Time steps

First, though, be certain that your data is the correct type to employ forecasting or nowcasting. DataRobot categorizes data based on the time step—the typical time difference between rows—as one of three types:

Time step Description Example
Regular Regularly spaced events Monday through Sunday
Semi-regular Data that is mostly regularly spaced Every business day but not weekends.
Irregular No consistent time step Random birthdays

Assuming a regular or semi-regular time step, DataRobot's time series functionality works by encoding time-sensitive components as features, transforming your original input dataset into a modeling dataset that can use conventional machine learning techniques. (Note that a time step is different than a time interval, which is described below.) For each original row of your data, the modeling dataset includes both:

  • new rows representing examples of predicting different distances into the future
  • for each input feature, new columns of lagged features and rolling statistics for predicting that new distance.

Project types

DataRobot’s time series modeling supports both regression and binary classification projects. Each type has a full selection of models available from Autopilot or the Repository, specific to the project type. Both types have generally the same workflow and options, with the following differences found in binary classification projects:

  • Treat as exponential trend? and Apply differencing? Advanced options are disabled, as is the Exposure setting.
  • Simple and seasonal differencing are not applied.
  • Only classification metrics are supported.
  • No differencing is performed, so feature lists using a differenced target are not created. By default, Autopilot runs on Baseline only (average baseline) and Time Series Informative Features. Note that "average baseline" refers to the average of the target in the feature derivation window.
  • Classification blueprints do not use naive predictions as offset in modeling.

Detailed workflow

Time series forecast modeling is based on the following framework; see below for a description of the framework elements. See the section on nowcasting to better understand that framework.

The following provides detailed steps for enabling time series modeling:

  1. After uploading a time series-friendly dataset, select a target and click on Set up time-aware modeling:

  2. From the dropdown, select the primary date/time feature. The dropdown lists all date/time features that DataRobot detected during EDA1.

  3. After selecting a feature (note that DataRobot detects the time unit), DataRobot computes and then loads a histogram of the time feature plotted against the target feature (feature-over-time). Note that if your dataset qualifies for multiseries modeling, this histogram represents the average of the time feature values across all series plotted against the target feature. Review the histogram:

    This example plots sales, per week, over time. In this two years of data, you can see seasonal spikes and that the business is growing over all.

  4. Select forecasting or nowcasting as the time-series approach you would like to apply:

  5. To enable multiseries modeling:

    Then, return to the next step to complete the time series configuration.

  6. Configure the time series model, i.e., set the windows of time DataRobot will use to derive features and the window basis.

    Note

    If using nowcasting, these window settings differ.

  7. Set the training window format, either Duration or Row Count, to specify how Autopilot chooses training periods when building models. Before setting this value, see the details of row count vs. duration and how they apply to different folds. Note that, for irregular datasets, the setting defaults to Row Count. While you can change this setting, it is highly recommended that you leave it, as changing to Duration may result in unexpected training windows or model errors.

  8. Consider whether to set "known in advance" (KA) features or upload an event calendar in the advanced options. Here you can identify the features to be treated as KA variables, setting them to be used unlagged when making predictions. Also, you can specify a calendar listing the events for DataRobot to use when automatically deriving time series features (setting features as unlagged when making predictions).

  9. From the Data tab, view the Feature Lineage chart for any feature to understand the process that created it:

  10. Then, explore what a feature looks like over time to view its trends and determine whether there are gaps in your data (which is a data flaw you need to know about). To access the histogram, expand a numeric feature and click the Over time option:

    In this example, you can see a strong weekly pattern as well as a seasonal pattern. You can also change the resolution to see how the data aggregates at different intervals. The binned data (blue bars at the bottom of the plot) represents the number of rows per bin. Visualization of data density can provide information about potential missing values.

  11. If desired, set Advanced options > Time Series.

  12. Click Start. DataRobot then takes the framework you configured and engineers new features to create the time series modeling data.

  13. Display the Data page to watch the new features as they are created. By default DataRobot displays the Derived Modeling Data panel; to see your original data, click Original Time Series Data. Click View more info for more specific feature generation details, including access to the derivation log.

Consider the Leaderboard

Once modeling begins, DataRobot displays models on the Leaderboard as they complete. Because time series modeling uses date/time partitioning, you can run backtests, change window sampling, change training periods, etc. from the Leaderboard (described here).

Some notes on time series models:

  • DataRobot builds both the standard algorithms and special time series blueprints to run specific models for time series. As always, you can run any time series models that DataRobot did not run from the Repository.

  • DataRobot generates both traditional time series models (e.g., the ARIMA family) and advanced time series models (e.g., XGBoost).

  • For models with the suffix "with Forecast Distance Modeling," DataRobot builds a different model for each distance in the future, each having a unique blueprint to make that prediction.

  • The "Baseline prediction using most recent value" model (also known as "naive predictions") uses the most recent value or seasonal differences as the prediction; this model can be used as a baseline for judging performance.

Make Predictions tab

There are two methods for making predictions with time series models:

  1. For prediction datasets that are less than 1GB, use the Make Predictions tab from the Leaderboard. This is the method described below.

  2. For prediction datasets between 1GB and 5GB, consider deploying the model and using the batch predictions capabilities available from Deployments > Predictions.

Note

Be aware that using a forecasting range with time series predictions can result in a significant increase over the original dataset size. Use the batch predictions capabilities to avoid out-of-memory errors.

The Leaderboard Make Predictions tab works slightly differently than with traditional modeling. The following describes, briefly, using Make Predictions with time series; see the full Make Predictions tab details for more information.

Note

ARIMA model blueprints must be provided with full history when making batch predictions.

The Make Predictions tab provides summaries to help determine how much recent data—either time unit or rows, depending on how you configured your feature derivation and forecast point windows—is required in the prediction dataset and to review the forecast rows and KA settings. Note that the list of features displayed as KA only includes those KA features that are part of the feature list used to build the current model. The Forecast Settings tab provides an overview of the prediction dataset for help in changing settings as well as access to the auto-generated prediction file template.

In this example, the prediction dataset needs at least 28 days of historical data and can predict (return) up to 7 rows. (Although the model was configured for 21 days before the forecast point, seven days are added to the required history because the model uses seven-day differencing.)

The following provides an overview to making predictions with time series modeling:

  1. Once you have selected a model to use for predictions, if you haven't already done so you are prompted to unlock holdout and retrain the model. It's a good idea to complete this step so that the model uses the most recent data, but it is not required.

  2. Prepare and upload your prediction dataset. Either upload a prediction-ready dataset with the required forecast rows for predictions or let DataRobot build you a prediction file template.

  3. Optionally, change the forecast point—the date to begin making predictions from—from the DataRobot default.

  4. Compute predictions.

Create a prediction-ready dataset

If you choose to manually create a prediction dataset, use the provided summary to determine the number of historical rows needed. Optionally, open Forecast Settings to change the forecast point, making sure that the historical row requirements from your new forecast point are met in the prediction dataset. If needed, click See an example dataset for a visual representation of the format required for the CSV file.

The following example shows that you would leave the target and non-KA values in rows 7 through 9 (the "Forecast rows") blank; DataRobot fills in those rows with the prediction values when you compute predictions.

When your prediction dataset is in the appropriate format, click Import data from to select and upload it into DataRobot. Then, compute predictions.

Note

While KA features can have missing values in the prediction data inside of the forecast window, that configuration may affect prediction accuracy. DataRobot surfaces a warning and also an information message beneath the affected dataset. Also, if you have missing history when picking a forecast point that is later than the default, DataRobot will still allow you to compute predictions.

Prediction file template

If your forecast point setting requires additional forecast rows be added to the original prediction dataset, DataRobot automatically generates a template file that appends those needed rows. Use the auto-generated prediction template as-is or download and make modifications. To create the template, click Import data from to select and upload the intended dataset. DataRobot generates the template if it does not find at least one row after the default forecast point that does not include a target value (no empty forecast rows) and therefore can be a forecast row.

For example, let's say your forecast window is +5 ... +6 and the default forecast point is t0. Points t5 and t6 are missing, but points t1 and t are present. In this case, DataRobot generates the extended file because it found no forecast rows that satisfy t5 or t6 after the default forecast point.

For DataRobot to generate a template, the following conditions must be met:

  • There are no supported forecast rows (empty target rows that fall within the forecast window).
  • The generated template file size is less than the upload file limit.

Use the template as-is

Use the template as-is if you do not need to modify the forecast rows or add any KA features. DataRobot will set the forecast point and add the full number of rows required to satisfy the project's forecast window configuration.

Use the default auto-expansion if you are using the most recent data as your forecast point, have no gaps, and want the full number of rows. In this case, you can upload the dataset and compute predictions.

Modify the template

DataRobot generates the prediction file template as soon as you upload a prediction dataset. However, there are cases where you may want to modify that template before computing predictions:

  • You have identified a column as a KA feature and need to enter relevant information in the forecast rows.

  • You have multiple series and want to predict on fewer than every series in the dataset. (DataRobot adds the necessary number of rows for each series in the dataset.)

  • Based on your settings DataRobot would have generated several additional rows but you want to predict on fewer.

To modify a template:

  1. Click Forecast Settings (Forecast Point Predictions tab), expand the Advanced options link, and download the auto-generated prediction file template:

  2. Open the template and add any required information to the new forecast rows or remove rows you don't need as they will only slow predictions.

  3. Save the modified template and upload it back into DataRobot using Import data from.

  4. Optionally, set the forecast point to something other than the default.

  5. Compute predictions.

Forecast Settings

The Forecast Settings modal provides configuration options for making two kinds of predictions:

  • Use Forecast Point Predictions to select the specific date (forecast point) from which you want to begin making predictions. By default, the forecast point is the most recent valid timestamp that maximizes the usage of time history within the feature derivation window. You can select any date shown since DataRobot trains models using all potential forecast points. Be sure, if you select a different forecast point, that your dataset has enough history.

  • Use Forecast Range Predictions for making predictions on all forecast distances within the selected date range. This option provides bulk predictions on an external dataset, including all forecast distance predictions for all rows in the dataset. Use the results for validating the model, not for making future predictions.

Forecast Point Predictions

The Forecast Settings > Forecast Point Predictions modal provides help in setting a forecast point different from the default point set by DataRobot:

Elements of the modal are described in the table below:

Element Description
Prediction type selector (1) Selects either Forecast Point (this page) or Forecast Range (bulk predictions).
Advanced options (2) Expands to download the prediction file template (if created).
Row summary (3) The same summary information as that on the Make Predictions tab. Colors correspond to the visualization below (6), showing the historical and forecast rows set during original project creation.
Valid forecast point range (4) In the context of the date span for the entire dataset (5), the colored bar above the full range indicates the range of dates that are valid forecast point settings (dates that will produce valid predictions). While the entire bar indicates possible valid options, dates within the yellow range are those that extend beyond DataRobot's suggested forecast point because they have missing history or KA features. Also, if there are gaps inside this range, the predictions may still fail (due to insufficient time history or no forecast row). See more date information.
Dataset start and end (5) The full range of dates found in the dataset. In cases where DataRobot created a prediction file template, the dataset end date and template file end date are both represented. If the dataset end and max forecast distance are the same, the display does not show the dataset end. The historical and forecast rows summarized above (3) are also overlaid on the span. The overlay moves as the forecast point setting changes. See more date information.
Historical and forecast zoom (6) A zoomed view of the relevant historical rows and forecast rows, intended to simplify selecting a forecast point (7).
Forecast point selector (7) A calendar picker for setting the forecast point. Invalid dates—those not indicated in the valid forecast range (4)—are disabled in the calendar. See more date information.
Close modal options (8) Initiate prediction computation (same as Compute Predictions on the Make Predictions page). Or, save the settings and close the modal without computing predictions. New settings are reflected on the Make Predictions page, and clicking Compute Predictions from there at any future time will use these settings. Alternatively, click the X to close without saving changes.

Forecast Point

The default forecast point (1) is either the most recent row in the dataset that contains a valid target value or, if you configured gaps during project setup, it is the row in the dataset that satisfies the feature derivation window’s history requirements. Open Forecast settings (2) to customize the forecast point.

Note

You must use the default forecast point for fractional-second forecasts.

Forecast Range Predictions

Forecast Range Predictions are helpful for validating model accuracy. DataRobot extracts the actual values for all points in time from the dataset. Set the prediction start and end dates to define the historical range of time for which you want bulk predictions. Because this model evaluation process uses actual values, DataRobot only generates predictions for timestamps that can support predictions for every forecast distance.

Understand dates in forecast settings

When you upload a prediction dataset, DataRobot detects the range of dates (the valid forecast range) available for use as the forecast point. It also determines a default forecast point, which is the latest timestamp available for making predictions with full history.

The following timestamps are marked in the visualization:

  • Data start is the timestamp of the first row detected in the dataset.
  • Data end is the timestamp of the last row detected in the dataset, whether it is the original or the auto-generated template.
  • Max forecast distance is the timestamp of the last possible forecast distance in the dataset.

Before modifying the forecast point, review the basic time series modeling framework.

Some things to consider:

  • What is the most recent valid forecast point? The most recent valid forecast point is the maximum forecast point that can be used to run predictions without error. It may differ from the default forecast point because the default forecast point takes the time history usage into consideration.

  • Based on the forecast window, what is the timestamp of the last prediction that was output? The forecast window is defined relative to the forecast point; the last prediction timestamp is a function of both the forecast window and the timestamp inside the prediction dataset.

    For example, consider a forecast window from 1 to 7 days. The forecast point is 2001-01-01, but the max date in the dataset is 2001-01-05. In this case, the max forecast timestamp is 2001-01-05 as there are no rows from 2001-01-06 to 2001-01-08.

  • Consider the length of your forecast window. That is, after the final row with actual values, do you have at least one forecast row (within the boundaries of the forecast window)? If you do, DataRobot will not generate a template; if you do not, DataRobot will generate forecast rows based on the project configuration.

Use the Forecast settings modal to get an overview of the prediction dataset, which aids in choosing settings like the forecast point and prediction start and end dates. In addition, DataRobot generates forecast rows after the final row with actual values (if there are no forecast rows based on the default forecast point), simplifying the prediction workflow. The actual values are the data taken from the last row of each and every series ID and duplicated to the forecast rows.

Time series prediction dataset validation

DataRobot validates a time series prediction dataset once it is uploaded, checking whether there are sufficient historical rows to produce the engineered features required by the project.

If seasonality is detected in the project, additional historical rows—longer than the feature derivation window (FDW)—are required. For example, a project with an FDW of [-14, 0] and 7-day seasonality will require 21 historical days in the prediction dataset to accommodate target differenced features (such as target (7 day diff) (mean)) and differencing features (such as target (14 day max) (diff 7 day mean)). If multiple seasonalities are detected, the longest seasonality is used to perform the validation check.

DataRobot does not require the presence of all historical rows when computing window statistics features (for example, target (7 day mean) or feature (14 day max)). Depending on the FDW settings, DataRobot predetermines the minimum required historical rows for predictions. If there are too many missing historical rows in the prediction dataset, predictions will error.

If a multiplicative trend is detected, DataRobot requires all historical target values in the prediction dataset to be strictly positive (> 0). Zero or negative target value(s) violate the model assumption that the dataset is multiplicative and the prediction generates an error. To correct it, check whether the training dataset is representative of the use case during prediction time or disable the advanced option Treat as exponential trend and recreate the project.

Compute and access predictions

When the forecast point is set and the dataset is in the correct format and successfully uploaded, it's time to compute predictions.

  1. There are two methods for computing predictions. Click either:

    • the Compute Predictions button on the Forecast Settings modal.
    • the Compute Predictions link (next to the Forecast Settings link) on the Make Predictions page.
  2. When processing completes, preview the historical data and predictions from the dataset or download a CSV of your predictions. To download, click Download to access predictions:

Note

Notes on prediction output:
• Depending on your permissions, you may see the column, "Original Format Timestamp". This provides the same values provided by the "Timestamp" column but uses the timestamp format from the original prediction dataset. Your administrator can enable this permission for you.
• When working with downloaded predictions, be aware that in time series projects, row_id does not represent the row position from the original project data (for training predictions) or uploaded prediction data for a given timestamp and/or series_id. Instead it is a derived value specific to the project.

With some spreadsheet software you could go on to graph your prediction output. For example, the sample data shows predicted sales for the next day through the next 7 days, which can then be acted on for inventory and staffing decisions.

Prediction preview

After you have computed predictions, click the Preview link to display a plot of the predictions over time, in the context of the historical data. This plot shows the prediction for each forecast distance at once, relative to a single forecast point.

By default, the prediction interval (shaded in blue) represents the area in which 80% of predictions fall. The intervals estimate the range of values DataRobot expects actual values of the target to fall within. They are similar to a prediction's confidence interval, but are instead based on the residual errors measured during the model's backtesting.

For charts meeting the following criteria, the chart displays an estimated prediction interval:

  • All backtests must be trained. In this way, DataRobot can use all available validation rows and prevent different interval values based on the available information.

  • There must be at least 10 data points per forecast distance value.

If the above criteria are not met, DataRobot displays only the prediction values (orange points).

You can specify a prediction interval size, which specifies the desired probability of actual values falling within the interval range. Larger values are less precise, but more conservative. For example, the default value of 80% results in a lower bound of 10% and an upper bound of 90%. To change the predictions interval, click the Options link and DataRobot recalculates the display:

Note

You can also set the prediction interval when making predictions.

Prediction intervals are estimated based on the quantiles of the out-of-sample residuals and as a result may not be symmetrical. DataRobot calculates, independently, per series (if applicable) and per forecast distance, so intervals may increase with distance, and/or have a range specific to each series. If you predict on a new series, or a series in which there was no overlap with validation, DataRobot uses the average across all series.

Hover over a point in the preview graph, left of the forecast point, to display the value from the historical data:

Or to the right of the forecast point to view the forecast (prediction):

When used with multiseries modeling, you have an option to select which series to preview. This overview indicates how the target, feature, or accuracy changes over time for an individual series and provides a forecast for that series. From the dropdown, select a series. Or, page through the series options using the left and right arrows. By comparing the prediction intervals for each series, you can better identify the series with that provide the most accurate predictions.

Note that you can also download predictions from within the preview plot.

More info...

The following sections provide some additional background discussion relevant to time-aware modeling:

Time series framework

The simple time series modeling framework can be illustrated as follows:

  • The Forecast Point defines an arbitrary point in time for making a prediction.
  • The Feature Derivation Window (FDW), to the left of the Forecast Point, defines a rolling window of data that DataRobot uses to derive new features for the modeling dataset.
  • Finally, the Forecast Window (FW), to the right of the Forecast Point, defines the range of future values you want to predict (known as the Forecast Distances (FDs)). The Forecast Window tells DataRobot, "Make a prediction for each day inside this window."

Note that the values specified for the Forecast Window are inclusive. For example, if set to +2 days through +7 days, the window will include days 2, 3, 4, 5, 6, and 7. By contrast, the Feature Derivation Window does not include the left boundary but does include the right boundary. (In the image above, it will use from 7 days before the Forecast Point to 27 days before, but not day 28). This is important to consider when setting the window because it means that DataRobot sets lags exclusive of the left (older) side, but inclusive of the right (newer) side. Be aware that when using a differenced feature list at prediction time, you need to account for the difference. For example, if a model uses 7-day differencing, and the feature derivation window spanned [-28 to 0] days, the effective derivation window would be [-35 to 0] days.

The time series framework captures the business logic of how your model will be used by encoding the amount of recent history required to make new predictions. Setting the recent history configures a rolling window used for creating features, the forecast point, and ultimately, predictions. In other words, it sets a minimum constraint on the feature creation process and a minimum history requirement for making predictions.

In the framework illustrated above, for example, DataRobot uses data from the previous 28 days and as recent as up to 7 days ago. The forecast distances the model will report are for days 2 through 7—your predictions will include one row for each of those days. The Forecast Window provides an objective way to measure the total accuracy of the model for training, where total error can be measured by averaging across all potential Forecast Points in the data and the accuracy for each forecast distance in the window.

Now, add the gaps that are inherent to time series problems.

This illustration includes the blind history gap (BHG) and the can't operationalize gap (COG).

BHG captures the gap created by the delay of access to recent data (e.g., “most recent” may always be one week old). The BHG is the smaller of the values supplied in the Forecast Derivation Window. A gap of zero means "use data up to and including today," a gap of one means "use data starting from yesterday" and so on.

The "can't operationalize gap" occurs immediately after the Forecast Point. This is the period of time that is too near-term to be useful. For example, predicting staffing needs for tomorrow may be too late to allow for taking action on that prediction.

Set window values

Use the Feature Derivation Window (FDW) and Forecast Window (FW) to configure how DataRobot derives features for the modeling dataset.

On the left, the FDW (1), constrains the time history. That is, it defines how many values to look at (no further back than x, no more recent than y), which determines how much data you need to provide to make a prediction. In the example above, DataRobot will use the most recent 28 days of data.

On the right, the FW (2) sets the feature range the model outputs. The example configures DataRobot to make predictions on days 1 through 7 after the forecast point. Note that the time unit displayed (days, in this case) is based on the unit detected when you selected a date/time feature.

You can specify either the time unit detected or a number of rows for the windows (they are synchronized to be the same). DataRobot calculates rolling statistics using that selection (e.g., Price (7 days average) or Price (7 rows average)). Note that when you configure for row-based windows, DataRobot does not detect common event patterns or seasonalities. DataRobot provides special handling for datasets with irregularly spaced date/time features, however. If your dataset is irregular, the window settings default to row-based.

You can change these values (and notice that the visualization updates to reflect your change). For example, you may not have real-time access to the data or don't want the model to be dependent on data that is too new. In that case, change the FDW. If you don't care about tomorrow's prediction because it is too soon to take action on, change the FW to the point from which you want predictions forward. This changes how DataRobot optimizes models and ranks them on the Leaderboard, as it only compares for accuracy against the configured range.

Create non-forecasting time series models

There are times when you may want to create time series models that predict current values, not future values. For example, in an anomaly detection project you may want to answer the question, "is the observation I see right now an anomaly?" Or, in some situations you might want to use time series values to understand the current value of the target given the current parameters (features) and their recent values. For this type of project, use DataRobot's nowcasting capabilities.

Duration and Row Count

If your data is evenly spaced, Duration and Row Count give the same results. It is not uncommon, however, for date/time datasets to have unevenly spaced data with noticeable gaps along the time axis. This can impact how Duration and Row Count are handled by DataRobot. If the data has gaps:

  • Row Count results in an even number of rows per backtest (although some of them may cover longer time periods). Row Count models can, in certain situations, use more RAM than Duration models over the same number of rows.
  • Duration results in a consistent length-of-time per backtest (but some may have more or fewer rows).

Additionally, these values have different meanings depending on whether they are being applied to training or validation.

For irregular datasets, note that the setting for Training Window Format defaults to Row Count. Although you can change the setting to Duration, it is highly recommended that you leave it, as changing may result in unexpected training windows or model errors.

Handle training folds

The values for Duration and Row Count in training data are set in the training window format section of the Time Series Modeling configuration.

When you select Duration, DataRobot selects a default fold size—a particular period of time—to train models, based on the duration of your training data. For example, you can tell DataRobot "always use three months of data." With Row Count, models use a specific number of rows (e.g., always use 1000 rows) for training models. The training data will have exactly that many rows.

For example, consider a dataset that includes fraudulent and non-fraudulent transactions where the frequency of transactions is increasing over time (the number is increasing per time period). Set Row Count if you want to keep the number of training examples constant through the backtests in the training data. It may be that the first backtest is only trained on a short time period. Select Duration to keep the time period constant between backtests, regardless of the number of rows. In either case, models will not be trained into data more recent than the start of the holdout data.

Handle the validation fold

Validation is always set in terms of duration (even if training is specified in terms of rows). When you select Row Count, DataRobot sets the Validation Length based on the row count.

Create the modeling dataset

The time series modeling framework extracts relevant features from time-sensitive data, modifies them based on user-configurable forecasting needs, and creates an entirely new dataset derived from the original. DataRobot then uses standard, as well as time series-specific, machine learning algorithms for model building. (You can also identify specific features that you would like to exclude from derivation.)

Users cannot influence the type of new features DataRobot creates, but the application adds a variety of new columns including (but not limited to): average sales, max value over past x days, median value over x days, rolling most frequent label, rolling entropy, average length of text over x days and many more.

Additionally, with time series date/time partitioning, DataRobot scans the configured rolling window and calculates summary statistics (not typical with traditional partitioning approaches). At prediction time, DataRobot automatically handles recreating the new features and verifies that the framework is respected within the new data.

Time series modeling features are the features derived from the original data you uploaded but with rolling windows applied—lag statistics, window averages, etc. Feature names are based on the original feature name, with parenthetical detail to indicate how it was derived or transformed. Clicking any derived feature displays the same type of information as an original feature. You can look at the Importance score, calculated using the same algorithms as with traditional modeling, to see how useful (generally, very) the new features are for predicting. See the time series feature engineering reference for a list of operators used and feature names created by the feature derivation process.

Review data and new features

Once you click Start, DataRobot derives new time series features based on your time series configuration, creating the time series modeling data. By default DataRobot displays the Derived Modeling Data panel, a feature summary that displays the settings used for deriving time series features, dataset expansion statistics, and a link to view the derivation log. (To see your original data, click Original Time Series Data.)

When sampling is required, that information is also included. Click View more info to see the derivation log, which lists the decisions made during feature creation and is downloadable.

Within the log, you can see that every candidate derived feature is assigned a priority level (Generating feature "Sales (35 day mean)" from "Sales" (priority: 11) for example). When deciding which of the candidates to keep after time series feature derivation completes, DataRobot picks a priority threshold and excludes features outside that threshold. When a candidate feature is removed, the feature derivation log displays the reason:

Removing feature "y (1st lag)" because it is a duplicate of the simple naive of target

or

Removing feature "y (42 row median)" because the priority (7) is lower than the allowed threshold (7)

Feature Lineage tab

To enhance understanding of the results displayed in the log, use the Feature Lineage tab for a visual "description" that illustrates each action taken (the lineage) to generate a derived feature. It can be difficult to understand how a feature that was not present in the original, uploaded dataset was created. Feature Lineage makes it easy to identify not only which features were derived but the steps that went into the end result.

From the Data page, click Feature Lineage to see each action taken to generate the derived feature, represented as a connectivity graph showing the relationship between variables (directed acyclic graph).

For more complex derivations, for example those with differencing, the graph illustrates how the difference was calculated:

Elements of the visualization represent the lineage. Click a cell in the graph to see the previous cells that are related to the selected cell's generation—parent actions are to the left of the element you click. Click once on a feature to show its parent feature, click again to return to the full display.

The graph uses the following elements:

Element Description
ORIGINAL Feature from the original dataset.
TIME SERIES Actions (preprocessing steps) in the feature derivation process. Each action is represented in the final feature name.
RESULT Final generated feature.
Info () Dynamically-generated information about the element (on hover).
Clock () Indicator that the feature is time-aware (i.e., derived using a time index such as min value over 6 to 0 months or 2nd lag).

Downsampling in time series projects

Because the modeling dataset creates so many additional features, the dataset size can grow exponentially. Downsampling is a technique DataRobot applies to ensure that the derived modeling dataset is manageable and optimized for speed, memory use, and model accuracy. (This sampling method is not the same as the smart downsampling option that downsamples the majority class (for classification) or zero values (regression).)

Growth in a time series dataset is based on the number of columns and the length of the forecast window (i.e., the number of forecast distances within the window). The derived features are then sampled across the backtests and holdout and the sampled data provides the basis of related insights (Leaderboard scores, Forecasting Accuracy, Forecasting Stability, Feature Fit, Feature Effects, Feature Over Time). DataRobot reports that information in the additional info modal accessible from the Derived Modeling Data panel:

With multiseries modeling, the number of series, as well as the length of each series, also contribute to the number of new features in the derived dataset. Multiseries projects have a slightly different approach to sampling; the Series Insights tab does not use the sampled values because the result may be too few values for accurate representation.

Handle missing values

DataRobot handles missing value imputation differently with time series projects. The following describes the process.

Consider the following from a time series dataset, which is missing a row:

Date,y
2001-01-01,1
2001-01-02,2
2001-01-04,4
2001-01-05,5
2001-01-06,6

In this example, the value 2001-01-03 is "missing."

For ARIMA models, DataRobot will attempt to make the time series more regular and use forward filling. This is applicable when the Feature Derivation Window and Forecast Window use a time unit. When these windows are created as row-based projects, DataRobot will skip the history regularization process (no forward filling) and will keep the original data.

For non-ARIMA models, DataRobot uses the data as is. If it is too irregular, will not allow modeling to start.

Consider the following—the dataset is missing a target or date/time value:

Date,y
2001-01-01,1
2001-01-02,2
,3
2001-01-04,
2001-01-05,5

In this example the third row is missing Date, the fourth is missing y. DataRobot will drop those rows, since they have no target or date/time value.

Consider the case of missing feature values, in this example 2001-01-02,,2:

Date,feat1,y
2001-01-01,1,1
2001-01-02,,2
2001-01-03,3,3
2001-01-04,4,4
  • At the feature level, the derived features (rolling statistics) will ignore the missing value.

  • At the blueprint level, it is dependent on the blueprint. Some blueprints can handle a missing feature value without any issue. Others (for example, some ENET-related blueprints), DataRobot may use median value imputation for the missing feature value.

There is one additional special circumstance—the naive prediction feature, which is used for differencing. In this case, DataRobot uses a seasonal forward fill (which falls back on median if not available).

Time series interval units

Although many of the examples in this documentation show a time unit of "days," DataRobot supports several intervals for time series and multiseries modeling. Currently, DataRobot supports time steps that are integer multiples of the following units:

  • row
  • millisecond
  • second
  • minute
  • hour
  • day
  • week
  • month
  • quarter
  • year

For example, the time step between rows can be every 15 minutes (a multiple of minutes) but cannot be a fraction such as 13.23 minutes. DataRobot automatically detects the time unit and time step, and if it cannot, rejects the dataset as irregular. Datasets using milliseconds as a time unit must specify training and partitioning boundaries at the second level, and must span multiple seconds, for partitioning to operate correctly. Additionally, they must use the default forecast point to use a fractional-second forecast point.

Time series feature lists

DataRobot automatically constructs time series features based on the characteristics of the data (e.g., stationarity and periodicities). Multiple periodicities can result in several possibilities when it comes to constructing the features—both “Sales (7 day diff) (1st lag)” or “Sales (24 hour diff) (1st lag)” can make sense, for example. In some cases, it is better to perform no transforming of the target by differencing. The choice that yields the optimal accuracy often depends on the data.

After constructing time series features for the data, DataRobot creates multiple feature lists (the target is automatically included in each). Then, at project start, DataRobot automatically runs blueprints using multiple feature lists, selecting the list that best suits the model type. With non-time series projects, by contrast, blueprints run on a single feature list, typically Informative Features).

Time series feature lists can be viewed from the Data > Derived Modeling Data page, for example:

These lists are different, and more targeted, than those created by non-time series projects.

Exclude features from feature lists

There are times when you cannot exclude features from derivation because other features rely on those features. Instead, you can exclude them from a feature list. In that way, they are still used in initial feature derivation but are excluded from modeling.

Note the following behavior that results from excluding certain special features from feature lists:

  • Target column: DataRobot will not derive target-derived features.

  • Primary date/time column: DataRobot will not derive calendar and duration features.

  • Series ID column: DataRobot will not generate any models that depend on the series ID, including per-series, series-level effects, or hierarchical models.

MASE and baseline models

The baseline model is a model that uses the most recent value that matches the longest periodicity. That is, while a project could have multiple different naive predictions with different periodicity, DataRobot uses the longest naive predictions to compute the MASE score. MASE is a measure of the accuracy of forecasts, and is a comparison of one model to a naive baseline model—the simple ratio of the MAE of a model over the baseline model.

DataRobot identifies which model is being used as the baseline model for time series projects with a BASELINE badge.

To build a baseline model from the Repository, search for Baseline Predictions Using Most Recent Value and train it on the Baseline Only feature list that has the longest seasonality.

Automatically created feature lists

The following table describes the feature lists automatically created for time series modeling available from the Feature List dropdown:

Feature list Description
All Time Series Features Not actually a feature list, this is the dropdown setting that displays all derived features.
Date Only All features of type Date; used for trend models that only depend on the date.
Time Series Extracted Features A feature list version of All Time Series Features; that is, all derived features.
Time Series Informative Features* All time series features that are considered informative (includes features based on all differencing periods).
Univariate Selections Features that meet a certain threshold for non-linear correlation with the selected target; same as non-time series projects.
Baseline Only (<period>) Naive predictions column matching the period; used for Baseline Predictions blueprints.
No differencing
  • All available naive predictions features
  • Time series features derived using the raw target (not differenced)
  • All other non-target derived features
Target Derived Only Without Differencing (<period>)
  • All available naive predictions features
  • Time series features derived using the raw target (not differenced)
Note that this list is not run by default.
Target Derived Only With Differencing
  • Naive predictions column matching the period
  • Time series features derived using differenced target matching the period
Note that this list is not run by default.
With Differencing (<period>)
  • Naive predictions column matching the period
  • Time series features derived using differenced target matching the period
  • All other non-target derived features
With Differencing (average baseline)
  • Naive predictions using average baseline
  • Target-derived features that capture deviation from the average baseline
  • All other non-target derived features
With Differencing (EWMA baseline)
  • Naive predictions using average baseline with smoothing applied to the baseline
  • Target-derived features that capture deviation from the smoothed average baseline
  • All other non-target EWMA derived features
With Differencing (nonzero average baseline)
  • Naive predictions using nonzero average baseline (zero values are removed when computing the average)
  • Target-derived features that capture deviation from the average baseline
  • Target-derived features that capture lags and statistics of the target flag (whether or not it is zero)
  • All other non-target derived features
With Differencing (intra-month seasonality detection) Multiple feature list options to leverage detected seasonalities (see below).

* The Time Series Informative Features list is not optimal. Preferably, select one of the “with differencing” or the “no differencing” feature lists.

Feature lists for unsupervised time series projects

The following table describes the feature lists automatically created for time series projects that use unsupervised mode. See the referenced section for details on how DataRobot manages these lists for point anomalies and anomaly windows detection:

Feature list Description
Time Series Extracted Features A feature list version of All Time Series Features; that is, all derived features.
Time Series Informative Features All time series features that are considered informative for time series anomaly detection. For example, DataRobot excludes features it determines are low information or redundant, such as duplicate columns or a column containing empty values.
Actual Values and Rolling Statistics Actual values of the dataset together with the derived statistical information (e.g., mean, median, etc.) of the corresponding feature derivation windows. These features are selected from time series anomaly detection and are applicable to both point anomalies and anomaly windows.
Robust z-score Only Selected rolling statistics from time series derived features but containing only the derived robust z-score values. These features are useful for evaluating point anomalies.
SHAP-based Reduced Features A subset of features based on the Isolation Forest SHAP value scores.
Actual Values Only Selected actual values from the dataset. These features are useful for evaluating point anomalies.
Rolling Statistics Only Selected rolling statistics from time series derived features. These features are useful for evaluating anomaly windows.

Feature lists for Repository blueprints

When building models from the Repository, you can select a specific feature list to run—either the default lists or any lists you created. However, because some blueprints require specific features be present in the feature list, using a feature list without those features can cause model build failure. This may happen, for example, if you created a feature list independent of the model type. To prevent this type of failure, DataRobot checks feature list and blueprint compatibility before starting the model build and returns an error message if appropriate.

Additionally, because DataRobot can identify a preferable feature list type for some blueprints, it suggests that list by default.

Zero-inflated models

When the project target is positive and has at least one zero value, DataRobot always creates a non-zero average baseline feature list and uses it to build optimized zero-inflated models to reflect the data. These models may provide higher accuracy because the specialized algorithms model the zero and count distributions separately.

The non-zero average baseline feature list, with differencing, appends (nonzero) or (is zero) to the target name. Specifically:

  • For (nonzero): features are derived by treating any zero target value as an instance of a missing value.
  • For (is zero): features are derived by substituting target values with a boolean flag (whether the target is zero or not).

The transformed target values ("<target> (nonzero)" and "<target> (is zero)" are not used in modeling. To avoid target leakage during modeling, DataRobot only uses derived transformed target values (lags and statistics). In addition, the "With Differencing (nonzero average baseline)" feature list is only used for zero-inflated model blueprints, which are prefixed with "Zero-Inflated" (for example, Zero-Inflated eXtreme Gradient Boosted Trees Regressor). Note that not all model types have a zero-inflated counterpart.

Zero-inflated modeling considerations

When working with the zero-inflated model and/or feature list, keep the following in mind:

  • You can use the zero-inflated feature list to train on non-zero-inflated models and expect decent (if not optimum) performance.

  • If you use a different feature list to retrain a zero-inflated model, model performance may be poor since the model expects the target derived features in log scale.

Intra-month seasonality detection

Intra-month seasonality is the periodic variation that repeats in the same day/week number or weekday/week number each month. Detecting patterns in seasonality is important for building accurate models—how do you define the date needed from the previous month? Are you counting up from the beginning of the month or down from the end?

Some examples:

Repeat patterns Time unit Example
Same day of month Day A payment is due on a specific day of the month— "payment due on the 15th."
Same week of month and day of week Day Payday is on a certain position within the month—"payday is the second Friday."
Week of month Week High sales for a retail dataset the last week of each month—"sales quota for the month is calculated on the last day."

To provide better handling of seasonality, DataRobot detects and generates appropriate feature lists and then resulting features. These additions are based on whether, when executing the feature engineering that creates the modeling dataset, DataRobot detects intra-month seasonality and a Feature Derivation Window greater than a certain threshold. The feature lists run by Autopilot are based on the characteristics of the data, as described in the table below.

Note

"FDW covers at least X days" is equal to fdw_end - fdw_start >= X.

Condition Description Example
With Differencing (monthly)
Detected intra-month seasonality and feature derivation window covers at least 31 days
  • Naive predictions column matching the period (align to the beginning of the month)
  • Time series features derived using the differenced target matching the period (align to the beginning of the month)
  • All other non-target derived features
Use the first Nth day of the previous month target value as the prediction of the first Nth day of current month—March 5th will use the target value of Feb 5th. Or, in the case of March 30th, the list will use the value of Feb 28 (the last day of February).
With Differencing (monthly, same day from end)
Detected intra-month seasonality and minimum feature derivation window covers at least 31 days
  • Naive predictions column matching the period (align to the end of the month)
  • Time series features derived using the differenced target matching the period (align to the end of the month)
  • Use the last Nth day of the previous month target value as the prediction of the last Nth day of current month—March 31st will use the target value of Feb 28th (or February 29 in leap years).
With Differencing (monthly, same day of week, same week from start)
Detected intra-month seasonality, FDW start ≥ 35, FDW end ≤ 21, FDW window covers at least 29 days
  • Naive predictions column matching the period (align to the week of the month and weekday)
  • Time series features derived using the differenced target matching the period (align to the week number and weekday)
  • All other non-target derived features
Use the target of the first X-day of last month as the prediction of the first X-day of the current month—March 5th (Monday) will use the target value of the February Monday that falls between February 1-7.
With Differencing (monthly, same day of week, same week from end)
Detected intra-month seasonality, FDW start ≥ 35, FDW end ≤ 21, FDW window covers at least 29 days
  • Naive predictions column matching the period (align to the weekday and the "week of the month from the end of the month")
  • Time series features derived using the differenced target matching the period (align to the week number and weekday)
  • All other non-target derived features
Use the target of the last X-day of last month as the prediction of the last X-day of current month—March 31st (Tuesday) will use the target value of the February Tuesday that falls between February 22-28.
With Differencing (monthly, average of previous month)
Detected intra-month seasonality, FDW start ≥ 62, FDW end ≤ 21, FDW window covers at least 29 days
  • Naive predictions using average of the previous month
  • Target-derived features that capture deviation from the previous month’s average baseline
  • All other non-target derived features
Use the average target value of the previous month as the naive prediction of days in the next month—June 7 will use May 1-30 or the average target value of February as the naive prediction of days in March. (Requires a longer FDW.)
With Differencing (monthly, average of same week of previous month)
Detected intra-month seasonality, FDW start ≥ 37, FDW end ≤ 21, FDW window covers at least 29 days
  • Naive predictions using weekly average of the previous month
  • Target-derived features that capture deviation from the previous month’s weekly average baseline
  • All other non-target derived features
Use the first week average of last month as the predictions of the first week of the current month--see below for detail.
With Differencing (monthly, average of nonzero values of previous month)
Detected intra-month seasonality, FDW start ≥ 62, FDW end ≤ 21, FDW window covers at least 29 days with minimum target value of 0
  • Naive predictions using nonzero average of the previous month (zero values are removed in computing the average)
  • arget-derived features that capture deviation from the previous month nonzero average baseline
  • Target-derived features that capture lags and statistics of the target flag (whether or not it is zero)
  • All other non-target derived features
Use the average nonzero target value of February as the naive prediction of days in March.
With Differencing (previous week of the month nonzero values average baseline)
Detected intra-month seasonality, minimum FDW start ≥ 37, maximum FDW end ≤ 21, FDW window covers at least 29 days with minimum target value of 0
  • Naive predictions using weekly nonzero average of previous month (zero values are removed in computing the average)
  • Target-derived features that capture deviation from the previous month weekly nonzero average baseline
  • Target-derived features that capture lags and statistics of the target flag (whether or not it is zero)
  • All other non-target derived features
Use weekly nonzero average of previous month--see below for detail.
Monthly, average of same week from start of previous month

The following details calculations for naive prediction for March:

  1. Compute the weekly average from the start of the month:

    • March 1-7 uses the average value of February 1-7...March 22-31 (last day of month) uses the average value of February 22-28. This feature is called y (last month weekly average).
  2. Compute the weekly average from the end of the month:

    • March 25-31 uses the average value of February 22-28...March 1-10 uses the average value of February 1-7. This feature is called y (match end of the month) (last month weekly average).
  3. Compute the average of the above two features to calculate the naive predictions of the current month:

    • 0.5 * y (last month weekly average) + 0.5 * y (match end of the month) (last month weekly average)
Monthly, average nonzero values in same week from start

The following details calculations for naive prediction for March for nonzero values:

  1. Compute the weekly nonzero average from the start of the month:

    • March 1-7 uses the nonzero average value of February 1-7...March 22-31 (last day of month) uses the nonzero average value of February 22-28. This feature is called y (nonzero)(last month weekly average).
  2. Compute the weekly nonzero average from the end of the month:

    • March 25-31 uses the nonzero average value of February 22-28...March 1-10 uses the nonzero average value of February 1-7. This feature is called y (nonzero)(match end of the month) (last month weekly average).
  3. Compute the average of the above two features to compute the naive predictions of the current month:

    • 0.5 * y (nonzero)(last month weekly average) + 0.5 * y (nonzero)(match end of the month) (last month weekly average)

Common patterns of time series data

Time series models are built based on consideration of common patterns in time series data:

  1. Linearity : A specific type of trend. Searching on the term "machine learning," you see an increase over time. The following shows the linear trends (you can also view as a non-linear trend) created by the search term, showing that interest may fluctuate but is growing over time:

  2. Seasonality: Searching on the term "Thanksgiving" shows periodicity. In other words, spikes and dips are closely related to calendar events (for example, each year starting to grow in July, falling in late November):

  3. Cycles: Cycles are similar to seasonality, except that they do not necessarily have a fixed period and are generally require a minimum of four years of data to be qualified as such. Usually related to global macroeconoimc events or changes in the political landscapes, cycles can be seen as a series of expansions and recessions:

  4. Combinations: Data can combine patterns as well. Consider searching the term "gym." Search interest spikes every January with lows over the holidays. Interest, however, increases over time. In this example you can see both seasonality with linear a trend:


Updated November 30, 2021
Back to top