Predicting PicCollage users’ first purchase for targeted promotions

Project Details


Fall 2017


Reggie Escobar . Eduardo Salazar Uni Ang . Lynn Pan





PicCollage is an app to create amazing photo collages with custom stickers, fonts, background that makes creating collages a creative experience. It’s biggest revenue streams comes from In-app purchase where users could pay to remove watermark, add custom stickers, and backgrounds. Also the have popup ads; but because ads strategy is implemented different for Android and IOS users in this project the model is not using features extracted from ads behavior and the scope of the model proposed is directly extracted from the behavior when users create their first collage.
PicCollage strategy to increase revenue from in-app purchase is targeting users that are more likely to make a purchase or offering promotions to users that are not likely to make a purchase and considering that most of the users that make a purchase they do it in the first collages. Therefore, in this project, “Predicting PicCollage user’s first purchase probability for targeted promotions” the goal is to rank users that are more likely to make a first purchase when they create their first collage.
The dataset came from PicCollage, it contains the events(click, open_page, create_collage,etc) of September, 2017 and only new users. In total the data has 38,748,087 session of new users and 44 events triggered. The sample dataset we used for building our model is columns from user dimension and 79 derived variables from event data. Using users’ behavior from first open the app to first collage save to predict if that user will make a first purchase after that. Partitioning the data into training data, validation data, and test data, and using oversampling to deal with the imbalanced data.
Logistic regression, decision tree, random forest, boosted tree were included as our predictive models. Decision trees are easy to interpret and are capable of giving insights about the important features; random forest and boosted tree are improved version of decision tree, which can produce really good and robust predictions. Models mentioned above were implemented and their performances were compared based on top 10% decile lift chart, which is created to test the model’s ability to predict the top 10% of first purchase user. From the performance evaluation, boosted tree produced the best results. By using boosted tree model, it will get 1.78 times of first purchase users than randomly send promotion message to all users.
The recommendations for PicCollage is to use the model of boosted tree and oversampling for offering bundles/ discount to users that have a high probability of making a first purchase. For the date, in this predicting the data we are using is missing the October purchase; therefore, it should be more accurate if we add more data for purchase. In addition, to collect events Data per user for their days full history. Moreover, for variables, getting user information might help to predict first purchase earlier.

Application Area: