The customer database contains a field called "MARITAL_STATUS". This is an important field for business. It can help the marketing department to segment the customers and target marketing and promotional initiatives accordingly.
Currently, around 13% of the customers have not reported their marital status. Also we found that the ones who have reported status as single exhibit purchase behavior similar to those of married customers. Through our analysis, we intend to segregate customers into family and non-family customers.
The following data is currently available based on which we need to predict.
Customer Information: Sex , Age, Enrollment date
Customer Purchases: Comprehensive information related to his purchases i.e. The exact items purchased, their price, quantity, class, sub class, sku number etc.
This data can be aggregated at Customer level/ Basket level/ Item Level/ Class level etc. as per the requirement.
The analytics objective was to be able to build a model for successful prediction of marital status (in case the same is missing). This was a supervised predictive task, and both forward-looking and retrospective task as new and old records would fall under its purview.
We tried out different data analytics approaches such as KNN (at transaction level and customer level), Classification trees, Association, Logistic Regression, etc. And we finally used ensemble to combine the predictions of the best four models into one single prediction.
Based on our results tested on the test data, we realize that the error rate is much lower when we predict married status Vs the unmarried status. This is because the users have not updated their status accurately.
(i) We recommend this approach to be used for identifying the marital status of the existing customers
(ii) Use this approach to also predict the marital status of customers who have not filled it.
(iii) Direct your marketing and promotional activities accordingly.