Forecasting the daily traffic from Facebook fan page to content website for TC Incubator

Project Details

Term: 

Fall 2018

Students: 

Sam Kuo, Astro Yan, Serina Hung, Jay Lee

University: 

NTHU

Presentation: 

Report: 

TC Incubator is dedicated to strengthening youth innovation, focusing on helping entrepreneurs by providing consultants. TC also operates an official information website
called "Jinrih Deliver", which promotes and attracts users to the official website by managing
Facebook fan pages. They post daily and so they are constantly trying to improve the content of their posts to write articles that will attract more traffic. However, TC does not use the proper way of A/B testing, which makes it difficult for the social media manager to know whether the changes to their articles affect traffic. We hope to provide them with a tool that replaces the complexity of traditional A/B testing yet achieves the same effects. The predicted results serve as a control group for A/B testing.


We do the forecasting to predict the traffic from the following week. The dataset is from
the Google Analytics that TC set. However, because the original data only has total traffic, we use one series to forecast. We also take daily post numbers and divide them into six main types, and traffic number from the past 7 days as external data to predict the following day's traffic. After removing the extreme values and some pre-treatment, there are a total of 90 days left (about 3 months' worth of data). The external data contains the number of posts in each type, lag-1 of the three most popular posts, and lag-1 to lag-7 of the traffic, for a total of 16 columns.


We use 14 days as the validation period and the rest is allocated as the training period. We try Regression, Neural Network and Ensemble to build the model, and we found out that predicting 7 days gives the highest forecast accuracy for Ensemble. If we forecast 14 days then naive forecasting achieves better performance and other methods do not work as well.


We finally recommend using Ensemble as the predictive model. The final 7-day
validation result MAPE can be below 40, indicating that the prediction result is still good. If more training data is available, we believe we can achieve more accuracy and can be used as an A/B testing control group in the future to provide a more rigorous A/B testing method.
We suggest TC to detailedly record the adjustment and promotions they have done in
A/B testing, because it can improve forecasting accuracy. They can also use the tool to
do A/B testing, in order to improve the posts more efficiently. Moreover, it will be more
convenient for their marketing team to use Shiny interactive interface as well as combine Google API to automate data collection to offer a complete solution. This will help their marketing team to make a decision and make the A/B testing more convenient and effective.

Application Area: