Predicting readability of The News Lens' next online articles to enhance reader loyalty

Project Details


Fall 2017


Uniss Tseng, Elisa Wang, Patrizia Mach, Sabrina Wei





In order to establish a dedicated reader base, online news website The News Lens aims to drive traffic directly to their website rather than via third-party social media, such as Facebook. Establishing this goal involves selecting a list of featured articles to display on the homepage, which are most likely to be read completely and aid in establishing a reader habit to primarily use The News Lens for its insights to current events.
Using a linear regression and predictive data mining, we establish, based on the 752 articles of data provided by The New Lens, a model that gives out predictive readability scores for any new articles. This model has been chosen for its high performance and its ease to understand and apply. The predicted scores can be sorted from highest to lowest, such that the highest scores are articles that are most likely to be read to completion and create the desired user traffic to the website.
The model using the October 2017 data shows that changes in word count, certain authors, the number of articles they write and changes in the category of the articles are related to changes in the likelihood of an article to be read fully. Therefore, it would be prudent to review and revise the model regularly as business activities and reader preferences change. For a more consistent and accurate model, it is further suggested to apply a larger database spanning for example 6 months.

Application Area: