Forecasting the number of daily issues on GitHub repositories

Project Details


Fall 2016


Aditya Utama Wijaya, Wen Lee, Cindy Soh, Renaud Jollet De Lorenzo, V K Sanjeed





More business and government organizations developing IT solutions now use open-source repositories because of their reliability and rapid development. The bedrock of an open-source project is the community that uses, maintains, and creates new applications from it. This is because the more people who can see and test the code, the more likely any flaws will be caught and fixed quickly. Therefore, it becomes crucial for the foundation hosting the repository, to manage the massive number of issues submitted by users on a daily basis. Thus, forecasts of upcoming issues are valuable to open-source foundations that need to manage their manpower and resources to resolve issues efficiently, Open-Source repositories are highly tested and maintained pieces of software that are used in most IT projects to hasten development. We collected secondary data - number of daily issues and commits - from five such repositories on Github, then forecast their respective number of daily issues for the next three weeks using a variety of time series forecasting techniques. By evaluating the predictive performance metrics, and forecast time-plots, we selected the best forecasting techniques for each repository to generate ensemble forecasts. Our forecasts are on average about 16% more accurate than the seasonal naive benchmark, and capture the important elements in the series including trend, day-of-week seasonality and autocorrelation. We recommend our forecasting method to repositories with higher volume of daily issues, as for them, our 16% greater accuracy translates to a large number of issues going unresolved on a daily basis. Foundations should allocate more manpower to a repository with higher forecasted issues in the upcoming three weeks.

Application Area: