Sales Forecasting for Rossmann Stores

Project Details




Debajyoti Sarkar, Nishant Toshniwal, Ravi Batra, Raunak Singh, Vibin Varghese, Eeha Ashok





Problem description
Business Problem: Rossman is Germany’s second largest drug store chain with more than
1000 stores across the country. Every month, the store manager needs to set targets for the
sales team and design incentives for them. Currently, the managers set the targets based on
their intuition of how much the sales are going to be in next month- which often leads to
wrong target settings. Setting targets that are too high or unrealistic can lead to failure of
the sales teams to meet the targets and therefore, loss of morale. On the other hand,
setting targets that are very low will have costs in terms of lost opportunities/revenues.

Forecasting Problem: The goal is to enable managers to predict monthly sales. Our model will predict sales for the month of September’15 for 6 different stores of Rossman. August’15 is taken as a lag month (forecasting horizon=1).

Description of the data

  • The “train” data (training data) contains daily sales data of 1,115 stores from January 13 to July 15. The data also contains the number of customers on a day, data on whether the store was open or closed on a day, whether it was a school holiday or state holiday, whether there was any promotion active on that day or not.
  • The “store” data also contains details of the stores such as store type, assortment, competition (distance from competitor stores, number of months since the opening of the competitor store etc.), promotion, number of weeks since when a promotion denoted as “promo2” have been given.
  • The “test” data contains the same columns as the “train” data except that it doesn’t have the daily sales- which needs to be forecasted. This is basically for the month of Aug 15 (this is the lagged month for deployment purpose) and Sept 15.

Source of the data: Kaggle

Key Characteristics

  • The time-series data contains daily sales for 1,115 stores over 31 months (January 13 to July 15) which can be aggregated to monthly sales
  • There are stores which are similar in attributes such as competition, assortment, and store size and therefore, each store can be representative of many other Rossman stores in the country.

We have selected around 6 stores that are representative of the stores in the dataset and
built forecasting models for them.

Conclusion and Recommendation

Since the stores are different in terms of assortments that they have, promotions that they
give and their proximity from competitors- there is no one forecasting model best for all of
them. Therefore, we’ve chosen six different stores that can together represent more than
1000+ stores and have built 6 different forecasting models for them.

Currently, the model predicts for only one month. It can be further modified to predict sales
for 3 months.

Application Area: