Tomorrow at INFORMS's Data Mining Cluster @ 1:30pm, I'll be presenting my work (with Inbal Yahav) "The Forest or the Trees? Tackling Simpson’s Paradox with Classification and Regression Trees". I'll show the special use of the tree structure that we take advantage of in order to detect whether a dataset has Simpson's Paradox (reversal of a causal direction when disaggregating the data). See our working paper on SSRN for more details.
Talk @ INFORMS: Trees for Detecting Simpson's Paradox in Big Data
Tree based approach for addressing self-selection in Big Data: forthcoming in MIS Quarterly
My paper A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Big Data with Deepa Mani (Indian School of Business) and Inbal Yahav (Bar-Ilan University) is forthcoming in MIS Quarterly, in the special issue on Transformational Issues of Big Data and Analytics in Networked Business. The paper introduces a novel method based on a classification and regression tree - a tool typically used for prediction in data mining - for use in studies that might suffer from self-selection bias, where observations self-select the treatment/control group. We present an alternative to the well-known Propensity Score approach, which is more automated, simpler to understand, more flexible in terms of assumptions and data types, and especially useful with Big Data.
A working paper of an earlier version is available on SSRN.
"Big Data & Analytics in the Digital Creative Industries" Talk at Taipei National University of the Arts
On Oct 17, 2015 @ 10am, I'll be giving a talk on "Big Data & Analytics in the Digital Creative Industries" at Taipei National University of the Arts' Film Making Department, as part of Professor Randy Finch's course Digital Media Entrepreneurship. I'll discuss getting Big Data and using it (with Analytics), both by the big content providers and platforms for TV, film, music, etc. as well as by "outsiders" - entrepreneurs, developers, and researchers.
Modeling bivariate discrete data - paper now in print!
My paper Modeling Bimodal Discrete Data Using Conway-Maxwell-Poisson Mixture Models with co-authors Smarajit Bose, Pragya Sur and Paromita Dubey (ISI Kolkata) is finally in print in the ASA's Journal of Business & Economic Statistics. We develop a method for modeling the distribution of bimodal discrete data, such as rankings (on a 5-star scale) and even censored data.
For some mysterious reason, our paper went through two rounds of independent proofs, and hence the delay in publication. The good news is that the link (above) to the paper provides a free eprint to the first 30 downloads.
Nature Methods piece on scientific replicability/repeatability/reproducibility
Nature Methods just published our correspondence piece Clarifying the terminology that describes scientific reproducibility (co-authored with Ron Kenett). We make three points:
1. There's confusion between the terms replicability, repeatability and reproducibility, which differ in meanings across and sometimes within fields.
2. Currently, each term is defined in a specific context by giving a laundry list of "what conditions remain constant and what conditions are changed".
3. Instead: focus on generalization! By answering "what is your study trying to generalize to?" it become very clear why some conditions are held constant and others are changed. Moreover, it helps focus on the goal, strengths and limitations of the study.