Cab travel time prediction

Project Details




Bhushan Khandelwal, Mahabaleshwar Bhat, Mayank Gupta, Shikhar Angra, Sujay Koparde





Idea: Our project involves developing a model for predicting the travel time for a particular cab booking.
Such a model holds a lot of value for the consumers as well as the cab company. Many times customers
inquire the booking agents about the time it will take to travel from their source to destination but the
booking agents are not able to provide a specific time estimates. They only provide a rough estimate
based on the distance between the two locations and sometimes this estimate is way off because they do
not take into consideration the difference in travel time at different hours of the day and difference in time
taken to cover same distance on two different routes.

Value Proposition: Customers plan their travel to ensure they reach the destination on time, therefore
they tend to err on the side of caution but sometimes they plan the travel so much in advance that they
reach the destination much earlier than required. This is again not a desirous situation. With our model we
would be able to predict the travel time quite accurately and this would allow the customers to plan the
travel accordingly to ensure they reach the destination in time and not too much in advance.
From Cab Company’s perspective such a feature can result in better customer value and higher customer
satisfaction levels which could lead to higher customer acquisition and retention levels.

Data: We used the data provided by for building our model. Each row in the dataset
corresponds to a single journey. There were approximately 80,000 bookings in the dataset between
November 2011 and November 2013 and the information included locations, time stamps, vehicle id,
cancellation status, and more. For our model we needed records with both the start time and end time and
thus we could work with only 21,000 records which were for Point to Point travel. We didn’t consider
intercity travel because of high dependence of intercity travel time on external factors. The main columns
of interest for our model were: vehicle_id, from_date & to_date, from_lat, to_lat, from_long, to_long and
the output variable timeDiff_inMin. We added some dummy variables for vehicle_id and day and time
slot of journey and added a derived column distance based on the “from” and “to” latitude and longitude.

Analytics Solution: Analytics solution for this model involves predicting the travel time for a particular
journey. So this is a supervised data mining task. We would have all the predictors (travel day and time
slot, vehicle model and “from” and “to” location) at the time of booking and so can predict the time using
the model. We could have also made it a classification task rather than a prediction task but that might
have involved too many classes to handle. Thus it would be better to predict the time and then add some
safety buffer to give the customer a range of travel time rather than a spot prediction.

Recommendations: Such a model/feature obviously needs to be tested thoroughly before going ahead
with a full blown rollout. Therefore we recommend rolling out the new feature in a phase wise manner
starting with certain routes where travel time can be predicted with high accuracy. The feature should be
rolled out in 2-3 popular routes where the number of bookings is significant enough to test the model
properly but at the same time it should not be rolled out on the most popular routes because it might
impact a large number of customers if there are any issues with the model or implementation. Depending
on the success of the new feature and the accuracy of the model the feature can then be rolled out on other
routes too.
Secondly to take care of various idiosyncrasies we recommend displaying a time range instead of spot

Application Area: