8 Low-hanging fruit: simple forecasts
Simple forecasting methods should be considered first. These methods are intuitive, and they can correspond to simple rules of thumb. For example, if you stock up to a level based on the average of the last 12 months of demand, you forecast demand using a rolling mean. We will discuss several such methods in this chapter.
There are three reasons for starting with simple methods:
- They may give you a quick win without creating a complicated statistical forecast.
- They give you a baseline to compare to more complicated forecasting methods.
- They are often very effective. Morlidge (2014b) found that across multiple businesses, half of the forecasts did not improve on the naive no-change forecast.
8.1 The historical mean
One baseline benchmark every forecaster should look at is the simple historical mean: take all historical observations and average them. The result is your forecast for the entire future horizon. Simple, easy to understand and implement – and often surprisingly effective (Andy T, 2016).
There is a statistical reason for its good performance. This simple forecast is optimal for a white noise statistical process, i.e., demand that randomly deviates around a long-term level.
8.2 The naive forecast
The naive forecast, also called the no-change or random walk forecast, means forecasting the last observation across the future. This is very simple and easy to understand and execute. Such a forecast will vary more from day to day than the historical mean: each time we see a new observation, the naive forecast can change radically. In contrast, the historical mean will be much less influenced by a single new observation. You can call this property of the naive forecast “high adaptivity” and consider it a feature or call it “high variance” and consider it a bug – which perspective is more helpful depends on your forecasting environment.
The naive forecast does have some statistical support. It is optimal for an integrated white noise process of order 1, or an I(1) process for short, which is a random walk – hence the name of this forecast. See Chapter 10 for details. Some measures to track forecast accuracy, such as the Mean Absolute Scaled Error (MASE), explicitly use this naive forecast as a baseline (see Chapter 17).
8.3 The seasonal naive forecast
If you predict a demand time series with strong seasonality, the naive forecast will always lag the seasonal signal. The historical mean, on the other hand, will smooth out any seasonality, pretending that it represents noise, for a forecast that is too low during the high season and too high during the low season.
A simple method that works for strongly seasonal series is the seasonal naive forecast, which is also referred to as a seasonal no-change or seasonal random walk forecast. This forecast is the most recent observation from the same point in the seasonal cycle. For example, our forecast for next January would be our observation from last January, and our forecast for next Wednesday would be our observation from last Wednesday.
Using this simple forecasting method also has a statistical justification: the seasonal naive forecast is optimal for a seasonally integrated process of order 1, i.e., one where an observation only depends on the observation one seasonal cycle before, plus random noise.
You may suspect that multiple seasonalities drive your data. For example, daily retail demand may exhibit weekly seasonality (higher demand on Friday and Saturday than during the rest of the week) and yearly seasonality (higher demand in summer than in winter). We will address more complex models that allow considering multiple seasonalities in Chapter 15 – here, we focus on simple methods. How should we create a naive forecast for a Wednesday in winter in our example? As a rule of thumb, you can pick the dominating seasonality for your simple forecast. If the fluctuations between days of the week are more substantial than the fluctuations over the year, take the demand from last Wednesday as a forecast for this Wednesday. And if the yearly fluctuations dominate, take the average demand from the same week of the previous year as a forecast. Of course, you can also use the average of these two possibilities (see Section 8.6 on the surprising effectiveness of averaging forecasts) or pick a historical Wednesday from roughly the same time last year as a forecast.
8.4 Other simple methods
Don’t feel constrained by the three methods outlined above. There are other possibilities, like taking rolling averages of the last few observations. If you have multiple reasonable simple forecasts, it is often helpful to average them (see Section 8.6). The key is to do the simplest thing that could work, to profit from the three advantages given at the beginning of this chapter.
8.5 Non-expectation forecasts
So far, we have assumed that you want a point forecast that is right “on average.” But recall that it is often helpful to understand more about the probability distribution of demand (see Section 3). We want to understand best-case and worst-case scenarios. We need to know how far the distribution spans around the center. Such an understanding is necessary to plan safety stocks and service levels effectively. How can we calculate prediction intervals or quantile forecasts in a simple way?
The simple methods above only provide quantile forecasts with tweaks. There are two simple ways forward. One approach is to calculate the demand time series’s standard deviation and add one or two standard deviations to the central forecast to obtain an 85% or 97% quantile forecast. The other option would be to derive a simple quantile forecast from the time series using historical observations. For example, if we have 100 historical observations and want a 95% quantile forecast, we use the fifth-largest historical observation as our quantile forecast. In principle, such methods can factor in seasonality by only using data from the proper seasonal periods. In practice, getting reliable quantiles from seasonal data often requires too much data history.
8.6 Ensemble forecasting
One tool that often works quite well is taking the average of different possible forecasts. This is known as ensemble forecasting (from the French ensemble, meaning “together”) or forecast combination, and there is much evidence that combining forecasts from different methods provides superior forecasts compared to the forecasts from the individual methods by themselves (Armstrong, 2001). If we apply different methods to a time series, they will all pick up on different aspects or features of the data and extrapolate that into the future. Ensembling the forecasts together leverages all of them and adds stability, to boot. This approach works with both expectation and quantile forecasts. We can also add more complex or judgmental forecasting methods into the mix (see Section 16.4).
Straightforward ensembling takes the simple average of the constituent forecasts. All forecasts receive the same weight in the ensemble. A logical extension would be to create an ensemble with optimized weights. For instance, we could derive weights from how well the constituent forecasts worked in the past. Somewhat suprisingly, this idea does not work as well as one would expect. The fact that an unweighted average forecast often beats a weighted average using optimized weights has been referred to as the forecast combination puzzle. One explanation for it, proposed by Claeskens et al. (2016), is that the optimization of weights introduces uncertainty or noise into the entire forecasting pipeline – and this noise directly carries through to more noisy forecasts.
One can remove forecasts that do not add value (assuming we can identify them reliably) and create an ensemble from the remaining forecasts. This has been referred to as forecast pooling (Kourentzes et al., 2019).
We conclude this section with an important warning: do not attempt to build complex models instead of ensembles. Ensembling works because the constituent forecasts all pick up on and extrapolate different features of the underlying time series into the future. If that is so, why don’t we build one complex model that accounts for all possible features by itself instead? This approach is not recommended, since it requires an enormous amount of data. See Section 11.7 for an explanation.
8.7 When are our forecasts too simple?
Simple methods are a good beginning. There are two cases where simple methods will surely be insufficient:
- If strong drivers underlie your time series, we need to measure and account for them in our model (see Chapter 11). Such drivers can be anything, from promotions to the state of the economy. However, even in the presence of such drivers, you may be able to use a simple benchmark method, like using the sales from the last promotion as a forecast for your next promotion.
- If added accuracy adds a lot of business value, investing money and resources into improved data and better models may be worthwhile. Just be aware that in light of irreducible noise, we may not be able to achieve the accuracy we would like to get with commensurate effort (see Chapter 21).
Thus, KISS: Keep It Sophisticatedly Simple.
Key takeaways
- Simple forecasting methods are often surprisingly accurate – and simple to implement and explain.
- You should always implement simple methods, at least as a sanity check or a baseline.
- The most common simple methods are the historical mean, the naive forecast, and the seasonal naive forecast.
- A simple recipe that often improves accuracy is to average different forecasts.
- If you have important drivers, include them. KISS: Keep It Sophisticatedly Simple.