1.1 Problem Summary
Several cities in the continental US and Canada see temperature well below freezing. This has several consequences for local business as mobility is restricted and transport cost increase. Thus, for six cities, that show extreme weather condition (Chicago, Denver, Detroit, Vancouver, Toronto, Montreal), we have collected weather data for the past five years and intend to prepare a model that predicts the possible temperature range for the coming year. This will help Retail owners like Walmart, Kmart, Weston, Costco along with various mom-pop retails can plan their inventory stock up of various crucial SKUs. Inventory planning is critical as the extreme weather these cities see lead to costly and difficult inventory replenishment. There is also a decrease in sales as movement becomes more difficult and customer tend to avoid venturing out. This could lead to a potential loss of millions of dollars in overstocking and understocking costs.
1.2 Description of data, its source, key characteristics, & chart(s)
The data series contains ~5 years of high temporal resolution (hourly measurements) data of temperature in Kelvin in 3 US cities and 3 Canadian cities (Chicago, Denver, Detroit, Vancouver, Toronto, Montreal). The data has been downloaded from kaggle.com1 The data has the following key characteristics in terms of the time series components:
- Noise- This is the non-systematic component and is present in all the time series. It shows the random variation that results from measurement errors or other causes that are not accounted for.
- Level- This is a systematic component and a point estimate that shows the average value of the series. This is also present in all kinds of time series. ▪ Trend- All the six time-series have a linear trend.
- Seasonality- In all the six time-series we can observe additive seasonality.
1.3 High-level description of the final forecasting method and performance on meaningful performance metrics
The following methods were tried on the time series data for the six cities:
A) Holt Winter (Triple Exponential) with additive seasonality - This method of smoothing was chosen because the time series had both a linear trend and seasonality. The seasonality was a 365 period. We chose additive seasonality based on the raw data analysis.
B) Regression with Linear Trend – In some cities, Holt Winter’s model performance was not satisfactory, wherein the RMSE was quite high. We then tried linear regression for forecasting.
Categorical Variables: Season_Index_1 .....Season_index_12 , Predictor Variable: t
Output variable: City temperature
We had tried with Quadratic trend; however, the trend portion is so flat, that the coefficient of t2 was insignificant.
C) ARIMA – In the linear regression model, we saw that the forecasts were very staggered, i.e. in a window of few nearby forecasts were the same value. This was a strong indication for autocorrelation. On performing the lag analysis, it was confirmed that there was autocorrelation of lag-1. Therefore, we performed ARIMA and adjusted the linear regression forecasts by adding the latest forecasted errors/residuals.
1.4 Conclusions and Recommendations
From the project, we have the following conclusions and recommendations for the project:
- Forecasting over a short horizon is better: We learnt that forecasting over a shorter horizon is better than forecasting over a longer horizon as local market intelligence can be incorporated.
- Need to check autocorrelation: We saw in the linear regression models, that the ‘binned’ forecasts can exist – where same forecasts are given for 2-3 days window. Thus, a lag-1 or lag-2 autocorrelation can exist, and lag analysis should be done.
- Avoid overcomplication: It is advisable to not invoice many variables to forecast.
- Use Control Charts for model performance. Review and adjust the model after a fixed period (every 6 months) or when control chart indication of error.