Predicting customer churn for targeting promotions for a traditional taxi company in Vietnam

Project Details


Fall 2019


Meng-Hen Huang, Sammi Yien Lu, Viet-Cuong Trieu (Daniel), Xin Wang





Mai Linh corporation was founded in 1993. Before 2015, the time Uber entered the Vietnamese market, Mai Linh was the largest taxi company in Vietnam, accounting for over 50% of the market share and was the only company operating in 63/63 provinces. Since 2017, due to the competition of tech-based taxi services (Uber, Grab), Mai Linh's revenue has decreased significantly. The revenue for the year 2018 is about 70 million USD, equivalent to 43% of 2016 revenue. From around the middle of 2017, Mai Linh started to apply the taxi dispatch management system to optimize resources so that the taxi fare has also decreased to be equivalent to Uber/Grab. Currently, Mai Linh still faces fierce competition from money-burning promotional campaigns of tech-based taxi service, thus maintaining market share is a vital task.

Currently, Mai Linh serves about 1.5 to 2 million successful trips per month, of which the number of trips booked from the App accounts for only about 7% -10%. Although the marketing department was trying to attract more App booking customers, about 10% of regular customers leave the service every month. Therefore, the business goal is to implement precision marketing to target customers to retain customers with a limited marketing budget. To accomplish this business goal, we perform data mining to predict which customers will leave the service next month based on the lastest three months of transaction data.

After exploring the data and based on the domain expert, we selected seven columns in the booking request table, which would be derived into 18 predictors. To select the prediction method, we randomly selected 60% of the regular customer as training data, and the remaining 40% of customers were holdout samples for evaluation. After comparing and evaluating the predictive performance of different prediction methods based on the ROC curve and Lift curve, we chose the logistic regression method with seven predictors. Finally, to evaluate the business performance of the forecasting model, we use 3-month data (8-10/2019) to predict the customers will leave the service in November 2019. The results show that if the company uses the prediction model to select the top 10% of customers with the highest probability of leaving the service, the company will reach customers who actually leave the service 1.8 times (the lift) better than the random selection method.

In terms of implementation, Predictive models can be set to run automatically after the end of the month. Predicting results can be automatically sent to the Marketing department to support promotion planning for the next month. Also, the accepted time of the driver and the driver- customer familiarity are two predictors that can be improved. We also make recommendations so the company can improve the driver accepted time and the familiarity between drivers and customers, thereby contributing to regular customer retention.

Although we have made our best effort, this project has some limitations. First, we have not used location data to classify booking requests according to population density. Next, we have not yet categorized wait times according to different time frames (e.g., rush hours). In the future, the prediction model can be improved by exploiting location data, time data by hour and combined with data from the customer care center.

Application Area: