Improved Statistical Inference to Avoid Spurious Scientific Discoveries
Mathematics Accomplishment | 2015
Where the work was done: IBM Almaden Research Center
What we accomplished: From abstract of STOC paper: "A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. There is, however, a fundamental disconnect between the theoretical results and the practice of data analysis: The theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data are shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis."
- Preserving Statistical Validity in Adaptive Data Analysis, ACM Symposium on Theory of Computing (STOC), 2015.
- Generalization in Adaptive Data Analysis and Holdout Reuse, Neural Information Processing Systems (NIPS), 2015.
- The reusable holdout: Preserving validity in adaptive data analysis, Science 349(6248): pp. 636-638, 2015.