Time series framework¶
The simple time series modeling framework can be illustrated as follows:
- The Forecast Point defines an arbitrary point in time for making a prediction.
- The Feature Derivation Window (FDW), to the left of the Forecast Point, defines a rolling window of data that DataRobot uses to derive new features for the modeling dataset.
- Finally, the Forecast Window (FW), to the right of the Forecast Point, defines the range of future values you want to predict (known as the Forecast Distances (FDs)). The Forecast Window tells DataRobot, "Make a prediction for each day inside this window."
Note that the values specified for the Forecast Window are inclusive. For example, if set to +2 days through +7 days, the window includes days 2, 3, 4, 5, 6, and 7. By contrast, the Feature Derivation Window does not include the left boundary but does include the right boundary. (In the image above, DataRobot uses from 7 days before the Forecast Point to 27 days before, but not day 28). This is important to consider when setting the window because it means that DataRobot sets lags exclusive of the left (older) side, but inclusive of the right (newer) side. Be aware that when using a differenced feature list at prediction time, you need to account for the difference. For example, if a model uses 7-day differencing, and the feature derivation window spanned [-28 to 0] days, the effective derivation window would be [-35 to 0] days.
The time series framework captures the business logic of how your model will be used by encoding the amount of recent history required to make new predictions. Setting the recent history configures a rolling window used for creating features, the forecast point, and ultimately, predictions. In other words, it sets a minimum constraint on the feature creation process and a minimum history requirement for making predictions.
In the framework illustrated above, for example, DataRobot uses data from the previous 28 days and as recent as up to 7 days ago. The forecast distances the model will report are for days 2 through 7—your predictions will include one row for each of those days. The Forecast Window provides an objective way to measure the total accuracy of the model for training, where total error can be measured by averaging across all potential Forecast Points in the data and the accuracy for each forecast distance in the window.
Now, add the gaps that are inherent to time series problems.
This illustration includes the blind history gap (BHG) and the can't operationalize gap (COG).
BHG captures the gap created by the delay of access to recent data (e.g., “most recent” may always be one week old). The BHG is the smaller of the values supplied in the Forecast Derivation Window. A gap of zero means "use data up to and including today," a gap of one means "use data starting from yesterday" and so on.
The "can't operationalize gap" occurs immediately after the Forecast Point. This is the period of time that is too near-term to be useful. For example, predicting staffing needs for tomorrow may be too late to allow for taking action on that prediction.