Wemo Scooter Co. aims to solve the problem of overwhelming scooters in Taiwan, providing service that combine sharing economy and eco-friendly scooters. E-scooter-sharing economy is now a hot issue and a competitive industry in Taiwan. WeMo, a first-mover, starts from Taipei in 2016 and keeps expanding their service area. In this project, our team collaborated with WeMo to work on the data mining job of predicting second ride from new registered user to assist in marketing effectiveness.
The goal of WeMo is to increase monthly active user rate of Wemo App by efficiently delivering suitable marketing resources to target users. To develop a long-term customer relationship, whether a user return to take a second ride after their first try is a critical indicator. Moreover, membership management is the key to stay in the top-tier of market playing in a competitive environment.
Collecting data on WeMo’s Google Cloud Platform through Bigquery, we aggregated data from August till October into user level based on our goal of whether a user will have their second ride. As a result, we have 1928 records of new user data and 89 columns (variables) representing their behaviour during the time interval within first ride. Categorical outcome variable was derived from user data by means of identifying the existence of second ride within a week.
To deliver a well-performed and stable prediction, we reduced dimensions of the dataset using variable selection methods. Then we examined the performance of every model based on each subset created by previous steps. We found a higher cost in the reward given to riders with higher probability, to identify those riders accurately, logistic regression from stepwise variable selection subset gave us the best results and the lowest amount of variables; On the other hand, users with lowest probability should also be identified. By which random forest from stepwise variable selection subset gave us the best performance.
In implementation phase, our difficulties came from the small number of records which was ranging from their first ride to second ride might not be sufficient. Adding to that, we recommended the data analysis team to re-train model with new data weekly or monthly because it’s one-time analysis and could not apply in real-time. Furthermore, transforming dataset into user-level is an important step. As for marketing team, the list of members with higher/lower probability to ride second time from our prediction is useful for creating customized promotion packages to activate users efficiently.
In a nutshell, a few recommendations we suggested:
(1) Periodically adjusting the model and database in order to track user behaviors in time, record the abnormal conditions and thus increase data quality.
(2) Combining domain know-how and other kind of behavioral datasets such as charging or riding data to identify key factors/processes within app and figure out potential user pattern.
(3) Adopting A/B tests to find the preferred campaigns for different target customer segmentations and also well managing the membership list (customer-labeling).
(4) Modifying payment plans and critical steps in app manipulation based on our insight.