"The Forest or the Trees? Tackling Simpson's Paradox in Big Data with Trees" - at ECIS 2014

Earlier this month, Inbal Yahav (Bar Ilan University) and I presented our joint work on detecting Simpson's Paradox in big data as a poster at ECIS 2014 (thanks to the many interested visitors!), and at 2014 SCECR. This work describes an unusual use of classification and regression trees for a causal goal, rather than their normal use in prediction. We develop a tree variant that helps detect possible paradoxes in large datasets.

Too Big To Fail: Large Samples and the P-Value Problem -- forthcoming in ISR

This weekend an important paper that I co-author with Hank Lucas and Mingfeng Lin has been accepted to the prestigious journal Information Systems Research. The paper, entitled "Too Big to Fail: Large Samples and the P-Value Problem" describes a critical challenge that occurs in modeling large samples. Publications in fields such as Information Systems as well as other social sciences have begun to rely on very large samples for testing theories.

Matlab Code for Biosurveillance

These Matlab modules below contain procedures for plotting control charts of different types, including ones that are based on wavelets. The functions are designed to be independent of specialized Matlab toolboxes. The code is distributed under the GNU General Public License. Note that you can use it (or change it according to the license), but I carry no responsibility to its accuracy and use.
Here are a few snapshots from the software output (click on chart to enlarge)


Subscribe to Research