The Indian movie industry produces the maximum number of movies per year at 1000/year, higher than any other country’s movie industry. However, very few movies taste success and are ranked high.
Given the low success rate, models and mechanisms to predict reliably the ranking and / or box office collections of a movie can help de-risk the business significantly and increase average returns. Various stakeholders such as actors, financiers, directors etc. can use these predictions to make more informed decisions.
Some of the questions that can be answered using prediction models are:
1. Does the cast or director matter in the success or ranking of an Indian movie?
2. Is the genre of the Indian movie a key determinant of rank or success?
3. Does running time matter?
Further, a DVD rental agency or a distribution house could use these predictions to determine which titles to stock or promote respectively.
Data from the Internet Movie Database (IMDB) was gleaned and various data mining and prediction techniques such as multi-linear regression, regression tree and K-nearest neighbors were used to devise a model that can predict an Indian movie’s ranking with an RMSE of 1.5 on a scale of 1 to 10.