October 19, 2017

"Researcher Dilemmas using Behavioral Big Data in Healthcare": Keynote at INFORMS DMDA Workshop

This coming Saturday I'll deliver a keynote talk on "Researcher Dilemmas using Behavioral Big Data in Healthcare" at the 12th INFORMS Workshop on Data Mining and Decision Analytics in Houston, TX.

When: Saturday, Oct 21, 13:45-14:30
Where: Hilton Americas-Houston, Level 3, Room 339

Behavioral big data (BBD) refers to very large and rich multidimensional data sets on human and social behaviors, actions, and interactions, which have become available to companies, governments, and researchers. A growing number of researchers acquire and analyze BBD for the purpose of extracting knowledge and scientific discoveries. However, the relationships between the researcher, data, human subjects, and research questions differ in the BBD context compared to non-BBD and even traditional behavioral data. Researchers using BBD face not only methodological and technical challenges but also ethical and moral dilemmas. In this talk, I will discuss several dilemmas, challenges, and trade-offs related to acquiring and analyzing BBD in healthcare research.

September 6, 2017

R edition of Data Mining for Business Analytics textbook now available!

Wiley just notified us that our new textbook Data Mining for Business Analytics in R is out! Thanks to all those who've encouraged us to write the R edition, to the beta testers, and to the many folks who've been holding their breath. And thanks to Professors Gareth James and Ravi Bapna for writing wonderful Forwords!

The R edition covers the same topics as the 3rd edition of Data Mining for Business Analytics with XLMiner that came out last year. This Fall I am teaching a course that allows students to choose between the two editions.

As with the other editions, all datasets (and R code!) are available at Adopting instructors can get access to instructor materials that include slides, solutions to end-of-chapter problems and cases, and more.

June 29, 2017

"Research Dilemmas with Behavioral Big Data" now published

My paper Research Dilemmas with Behavioral Big Data now appears in the new issue of Big Data journal. This is part of a special issue on Social and Technical Trade-Offs, guest edited by Barocas, boyd, Friedler & Wallach, and includes multiple important papers for data scientists, dealing with issues of ethics, bias, fairness and related topics.

June 17, 2017

Keynote at 2017PLS on "When Prediction Met PLS"

This morning I delivered the opening keynote address at the 9th international conference on PLS and related methods (2017PLS), in Macau, on "When Prediction Met PLS: What We learned in 3 Years of Marriage". My slides are now publicly available on Slideshare. Two more sessions today were dedicated to prediction, and even outside those sessions there were several talks focusing on prediction and PLS models.

May 11, 2017

Talk at HKUST (May 16): "A tree-based approach for modeling self-selection"

Next Tuesday I'll give a seminar talk on "A Tree-based Approach for Addressing Self-Selection in Impact Studies with Big Data" at The Hong Kong University of Science & Technology, in the department of Information Systems, Business Statistics, and Operations Management (lovely combination!). In the talk, I'll describe the cool tree-based method we developed for addressing self-selection as an alternative to propensity score matching (based our 2016 MISQ paper with Inbal Yahav and Deepa Mani).

For more details (where, when) see the poster.

For a very light non-technical description, see this 5-min video.

A major challenge in deriving insights from impact studies is differences between the treatment groups due to self‐selection or other factors unrelated to the intervention. We introduce a tree‐based approach adjusting for observable self‐selection bias in intervention studies in management research. In contrast to traditional propensity score matching methods, including those using classification trees as a subcomponent, our tree‐based approach provides a standalone, automated, data‐driven methodology that allows for (1) the examination of nascent interventions whose selection is difficult and costly to theoretically
specify a priori, (2) detection of heterogeneous intervention effects for different pre‐intervention profiles, (3) identification of pre‐intervention variables that correlate with the self‐selected intervention, and (4) visual presentation of intervention effects that is easy to discern and understand. As such, the tree‐based approach is a useful tool for analyzing observational impact studies as well as for post‐analysis of experimental data. The tree‐based approach is particularly advantageous in the analyses of big data. I'll illustrate the method and the insights it yields in the context of two impact studies with different study designs: reanalysis of a field experiment and observational data on the effect of training on earnings in the US; and analysis of a quasi‐experiment examining the impact of an e‐governance service in India.