YourCabs aggregates radio cabs from multiple taxi operators in Bangalore. For each model type, the supply shifts dynamically and depends on the number of drivers logged in. If cabs are unavailable, Yourcabs incurs cost on either upgrading the passenger or losing the sale. If several cabs do not have bookings, there is some loss of relationship with the supplier. Thus, YourCabs needs to be able to predict the demand for each cab type at an hourly basis so it can reach out to taxi operators and plan the number of drivers logging into its system.
Our forecasting problem predicts the demand for each cab type segment from November 15-22, 2013 at an hourly level. We first split the cab models into 5 segments – small cars, sedans, utility vehicles, premium cars, and buses. On plotting the data we see that the trend changes drastically from mid 2013 for small cars and sedans. Thus, only data for the last 6 months should be used to generate forecasts. Further, there is light seasonality at a weekly level and very strong seasonality at the hourly level for each segment. These seasonalities need be accounted for while making forecasts. We test the forecasting model over the first two weeks of November to evaluate and compare different models.
We first generate benchmark forecasts using a naive rule. We then create forecasting models using multiple linear regressions and the Holt Winter model. For Small cars, Sedans, and Utility vehicles, linear regression is a better predictor of cab demand over the validation period. We then generate forecasts for the forecast horizon using regression. However, the model has a tendency to under-predict the peak demand, and under-prediction cost is higher than over-prediction cost. Thus we attach a factor of safety to the forecasts before making any business decisions. We make suitable assumptions for the factors of safety for each of the popular segments. Premium cars and buses have seen very little demand over the past two years. Thus, it is not feasible to generate forecasts for their demand. Practically, these vehicles can be arranged specifically for any customer who asks for them.
In conclusion we recommend using multiple linear regression including intraday and intraweek seasonalities for forecasting hourly demand for each cab segment. The forecasts can be generated at the start of the day for the entire day and rolled forward for the next day. Our model does not take into account the location of the supply and demand. Nor does it factor in vehicles in transit that will become available for booking in the next hour. These aspects must be accounted for before deploying the forecasting model.