Business Problem – An entrepreneur has an idea for a new business venture, which, in a nutshell, is to
offer insurance to customers on price drops of certain products. Registered customers get a certain
multiple of the insurance fee or a certain percentage of the drop provided that the drop happens within
a certain period after purchase. Insurance is offered on a new portfolio of products every day and, for
the sake of simplicity, the period within which price needs to drop is taken as one day. The business
problem is to understand from the data, if the business idea is theoretically and financially viable.
Data – The available data is a list of products sold across five e-commerce sites. The data gives us
information about shipping, average ratings, brand, category, in stock etc. Some features of the data are
(a) Categorical variables such as model name, brand etc. which cannot simply be converted as they
would explode the number of categorical variables (b) columns like average rating and review count
which had large number of missing values and were filled through imputation based on rest of the
columns (c) textual data columns such as shipping period which were converted into numerical
averages. Data such as this is available on the internet and can be crawled off various sites easily.
Data Analytics solution – To check if it is possible to construct a portfolio of products such that as high a
percentage of the products selected in the portfolio are due for a price increase so that losses on price
drops don’t overwhelm the insurance fee gained where the price goes up or stays the same. The idea is
to build various classification models that can predict if the price will go up or not and then construct a
portfolio by selecting top x% of products that have the highest predicted probability that the price will
go up next day among all the products considered. Observe that data mining only takes us towards
making a decision and quite a bit of external information, analysis and assumptions are needed before
one can come to a conclusion. To achieve the data mining goal, we would (a) Use available data of
various products sold by the five online retailers as predictors to classify products as ‘Price Up’ or not (b)
Rank order the results in terms of probability of price increase (c) Select top x% to offer insurance on.
Recommendations – Our data analysis shows that if products are randomly selected to form a portfolio
only 2.4% of products on an average would show a price increase. However, using our best model of
Classification tree, this percentage can shoot up to as high as 100% based on what “x” is chosen. If the
top x% is intelligently and variably chosen everyday and if the multiple on insurance fee paid to
customers upon price drop is carefully chosen so that it doesn’t overwhelm the insurance revenue, the
business can be very much viable. The entrepreneur can always explore additional streams of revenue
like advertisements on the site or building a user database.