October 27, 2015

Tree based approach for addressing self-selection in Big Data: forthcoming in MIS Quarterly

My paper A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Big Data with Deepa Mani (Indian School of Business) and Inbal Yahav (Bar-Ilan University) is forthcoming in MIS Quarterly, in the special issue on Transformational Issues of Big Data and Analytics in Networked Business. The paper introduces a novel method based on a classification and regression tree - a tool typically used for prediction in data mining - for use in studies that might suffer from self-selection bias, where observations self-select the treatment/control group. We present an alternative to the well-known Propensity Score approach, which is more automated, simpler to understand, more flexible in terms of assumptions and data types, and especially useful with Big Data.

A working paper of an earlier version is available on SSRN.

October 16, 2015

"Big Data & Analytics in the Digital Creative Industries" Talk at Taipei National University of the Arts

On Oct 17, 2015 @ 10am, I'll be giving a talk on "Big Data & Analytics in the Digital Creative Industries" at Taipei National University of the Arts' Film Making Department, as part of Professor Randy Finch's course Digital Media Entrepreneurship. I'll discuss getting Big Data and using it (with Analytics), both by the big content providers and platforms for TV, film, music, etc. as well as by "outsiders" - entrepreneurs, developers, and researchers.

August 11, 2015

Modeling bivariate discrete data - paper now in print!

My paper Modeling Bimodal Discrete Data Using Conway-Maxwell-Poisson Mixture Models with co-authors Smarajit Bose, Pragya Sur and Paromita Dubey (ISI Kolkata) is finally in print in the ASA's Journal of Business & Economic Statistics. We develop a method for modeling the distribution of bimodal discrete data, such as rankings (on a 5-star scale) and even censored data.

For some mysterious reason, our paper went through two rounds of independent proofs, and hence the delay in publication. The good news is that the link (above) to the paper provides a free eprint to the first 30 downloads.

July 31, 2015

Nature Methods piece on scientific replicability/repeatability/reproducibility

Nature Methods just published our correspondence piece Clarifying the terminology that describes scientific reproducibility (co-authored with Ron Kenett). We make three points:
1. There's confusion between the terms replicability, repeatability and reproducibility, which differ in meanings across and sometimes within fields.
2. Currently, each term is defined in a specific context by giving a laundry list of "what conditions remain constant and what conditions are changed".
3. Instead: focus on generalization! By answering "what is your study trying to generalize to?" it become very clear why some conditions are held constant and others are changed. Moreover, it helps focus on the goal, strengths and limitations of the study.

July 30, 2015

New textbook: Practical Time Series Forecasting with R

Last year I co-taught the Forecasting Analytics course at the Indian School of Business together with Casey Lichtendahl from Darden School of Business. The co-teaching inspired us to collaborate on "converting" my Practical Time Series Forecasting textbook, which is based on XLMiner software, to an edition that uses R. Since several colleagues have requested such an edition, and motivated by our colleague Rob Hyndman - the guru of forecasting and creator of R forecast package - we embarked on the journey. Our new textbook is now available on Amazon (softcover and Kindle).

Practical Time Series Forecasting with R: A Hands-On Guide joins the Practical Analytics series of textbooks. It covers the same material as the latest XLMiner edition, and is suitable for beginners in R. Instructors seeking an evaluation copy, please fill the online form.