Forecasting demand for pickups per hour in New York City for Uber

Project Details




Aniket Jain, Rachit Nagalia, Nakul Singhal, Ayush Anand, Priyakansha Paul, Prakhar Megotia





Uber is a ride-hailing company which was founded in San Francisco, California. Since its inception, it has expanded into multiple other businesses like ride sharing, food delivery etc and according to estimates, Uber has close to 100 million customers and operations in as many as 800 metropolitan areas. Given its ever-expanding scale, Uber continuously manages the gap between supply and demand through surge pricing, incentivising drivers and charging riders. There has, however, been a lot of backlash as surge pricing has gone above 20x at times. This project focuses only on city of New York and its six boroughs. More specifically, we intend to solve the below problems for Uber:

  • Manage demand by optimizing driver location across six boroughs which should result in increased driver pick up efficiency and ultimately guest experience.
  • Accurate demand forecasts combined with driver supply data, which Uber has all the time, can be used to get a better idea of surge pricing.

Data for this project was pulled from Kaggle for the timeline 01/01/2015 to 06/30/2015. Data contained hourly pick up data for New York city and its six boroughs namely Newark, Manhattan, Bronx, Queens, Staten Island and Brooklyn. Data for Newark was limited so we combined it with data which had NA as its borough. We found out that there is seasonality within a week also within the day itself, the below charts show pick-up patterns in boroughs and pickup by the time of day.

With an intention to forecast for 2 weeks, we partitioned our data with a validation period of 2 weeks. Given the nature of the data, we decided to focus on linear regression as our forecasting method and benchmarked the output against Seasonal Naive forecasts. We found that for some boroughs, Naive performed than linear regression. So, our final forecasts have been arrived at using both Naive and linear regression. The table highlights performance metrics for different boroughs.

Application Area: