I describe collaborations with India start ups to create data mining contests on platforms such as kaggle.com. A great way to open real, interesting, local data. Provides business analytics students with real problems and data as well as access to the domain experts; gives start ups novel creative ideas for using their data, visibility to the world, and relationship with academia. 2014 Open Data Camp is hosted by ISB SRITNE center on June 28, 2014.
"Opening data with Kaggle": talk at #OpenDataHyd
"Predicting, Explaining and the Business Analytics Toolkit": Keynote at the upcoming 2014 NASSCOM Big Data & Analytics Summit
I'll be presenting a keynote talk at the upcoming NASSCOM Big Data & Analytics Summit on Friday, June 27, 2014. In earlier talks, I have been emphasizing and introducing the advantages of predictive analytics. In this talk, I start from predictive analytics and move on to causal explanation. Synopsis: Big data have brought predictive analytics to the forefront by enabling organizations to generate micro-level predictions. Predictive analytic methods extract correlations and associations from rich datasets for the purpose of generating predictions. Personalized recommendations, offers, treatments, and interventions are examples of predictive analytics used in many data-rich-and-savvy organizations. While predictive analytics offer significant actionable value to companies by answering "who, what, when, where?", they are not capable of providing causal explanations for answering "why?" The good news is that statistical methods exist for causal investigation. The gold standard is randomized experiments, with alternative methods for cases when experiments are impossible. In the realm of Big Data, implementing such methods can offer new macro-level insights that can further strengthen data-driven decision making.
"The Forest or the Trees? Tackling Simpson's Paradox in Big Data with Trees" - at ECIS 2014
Earlier this month, Inbal Yahav (Bar Ilan University) and I presented our joint work on detecting Simpson's Paradox in big data as a poster at ECIS 2014 (thanks to the many interested visitors!), and at 2014 SCECR. This work describes an unusual use of classification and regression trees for a causal goal, rather than their normal use in prediction. We develop a tree variant that helps detect possible paradoxes in large datasets. The research-in-progress paper is available here, and the longer version is available on SSRN.
"Too Big To Fail" — invited talk at Israel Statistical Association annual conference
I'll talk about the problem of statistical inference with large samples in the closing panel with the killer title "Too much data + too much statistics = too many errors?" of the annual conference of the Israel Statistical Association. The session takes place on June 11, 2014 at 16:45 at the Open University in Raanana.