Classifying Biscuit Brand Switchers for Targeted marketing by a New Biscuit Manufacturer

Project Details




Archana Rajan, Kevin John, Aditi Vaish, Deepak Agnihotri, Pranav Maranganty





• The stakeholder in this data mining project is Mine Sweeper Biscuits (MSB), a premium biscuit manufacturer based out of Denmark. While MSB has entered the Indian market through retail outlets, its sales have failed to take off due to the low product trial rate among Indian consumers.

• The business goal of this project is to be able to predict “brand loyalty” or absence of the same towards biscuit brands, for any new customer making purchases at a hypermart, supermart or other retail outlet. The premise behind such an exercise is that customers who are brand loyal will likely buy only one brand of biscuit, while those who are prone to “switch” brands will be more open to trying any new biscuit brands introduced.

• The need to develop a model for finding the right target segment is attributed to customer acquisition costs. Sending promotional offers, coupons and trial samples is costly for MSB, and hence targeted promotions ensure that the company does not waste its resources targeting “brand loyalists”

• The data mining goal of this project is to create a supervised learning algorithm wherein given certain data about new customers at a supermarket or hypermarket (demographic information and #SKUs, price and quantity of last two purchased baskets); we should be able to predict whether she is a brand “loyalist” or a brand “switcher”.

• In order to accomplish this data mining goal, purchase data from the “Ready foods” department was aggregated at the basket level (1 Basket = 1 visit of a consumer). Data relating to #SKUs, average price and quantity purchased in the last and second last purchase was also derived for each customer. Note that in order to be included in this dataset the customer had to have made at least 3 purchases in the ready foods department. If more than 50% of the purchases were made of the same brand the customer was classified as a “brand loyalist”, else a “brand switcher”

• The data was partitioned three ways: a training set for developing the model and validation and testing sets for determining accuracy and possibility of “over-fitting”. Further a “holdout” set was created to test the final models developed. Standard data preparation methods such as missing data handling, transformation of categorical variables and creating binned variables (where necessary) where employed.

• Four models were tested: K-Nearest Neighbor, Naïve Bayes, Logistic regression and Classification trees. K-NN, Logistic regression and classification trees (CART) had low errors overall, however there was evidence of overfitting in K-NN. Thus both CART and logistic regression were deployed on the holdout set and based on our results and acquisition/promotion cost considerations we concluded that the misclassification costs were higher with CART. Thus, logistic regression was determined to be the model of choice (for predicting the brand loyalty/switching character of a new customer after her second purchase at the “Ready Foods” department).

• We recommend that this model be further updated by adding more customer demographic data (locality, area code etc) as and when available. The model can also be adapted to serve a similar business goal in other product categories of MSB.

Application Area: