Predict the average recipe rating on BBC Good Food

Project Details

Term: 

Fall 2015

Students: 

Claire Huang, Wan Yi Chou, Eva Shih, Pornlada Ittipornpithak

University: 

NTHU

Presentation: 

Report: 

The primary stakeholder is BBC Good Food which is a recipe website related to BBC. The
problem is that it is not the most popular website since it does not appear on the first page when user
searching in Google website by keyword as “recipe website”. Hence, the business goal of this project is
to improve the quality of recipes on the website. By doing so, we aim to attract more people to visit the
website and the analytic goal of the project is to predict the average rating value of new recipe before
publishing.

The data were crawled from the BBC Good Food website. The total data were about 8,400 records. After we had collected the data, we decided to handle the missing values by replacing them with the average value of each column. We also binned those predictors with too many categories, such as country and serving, into few categories. Besides, we derived a “total time” variable by adding prepare time and cook time together. After the data were prepared, we processed some visualization to help us know the data better.

We picked four tools as the follows: prediction tree, KNN, multiple linear regression and Principal
Component Regression to help us analyze. By comparing all the results, we chose multiple linear
regression as our model which results were less error. Most importantly, we could get a specific result to show the client how the rating value does improve.

We suggest BBC Good Food that they could increase prepare time, cooking time of the dish, writing
more article related to Side Dish and Starter, considering carefully when writing recipes about dinner or afternoon tea since it tends to get low rating value. We assume that readers might prefer something new and creative and choosing the dish level that is neither too hard nor too easy.

Application Area: