Nov 17, 2016

XL-Files: A Nobel Prize in Statistics, finally…

Contributing Editor Xiao-Li Meng writes:

A Nobel Prize in Statistics? Well, almost. The launching of the International Prize for Statistics (IPS), with its explicit references to the Nobel Prize (NP) and other major awards [see this link], aims to establish IPS as “the highest honor in the field of Statistics.” And its inaugural winner, Sir David Cox, is inarguably one of the two living statisticians who can instantly signify this intended status of IPS. However, many will argue about who are the N statisticians deserving this inaugural IPS, and indeed about the value of N itself. Whereas my N=2, I would not ruin your fun for imputing my other choice based on publicly available data, in case you are bored with your own list.

The data came from Some Nobel-Prize (NP) Worthy i.i.d Ideas in Statistics, a discussion I presented at JSM 2016. The “i.i.d.” criterions refer to “Ingenious, Influential, and Defiable.” The first two are obvious, and the third is necessary because any scientific idea must have demonstrable limitations, i.e., it can be “defied/defeated”. Fisher’s likelihood is an early example of an NP-worthy i.i.d idea. An ingenious flipping, from probability space to parameter space, created an exceedingly influential paradigm for statistical inference. Yet it is not almighty. A likelihood inference can lead to inadmissible or inconsistent estimators, and the “flipping” idea itself can result in complications, as revealed by the common description of how Fisher created his fiducial distributions.

The issue of inadmissibility naturally leads to Stein’s shrinkage estimation. The shrinkage phenomenon was considered paradoxical when Stein discovered it, and indeed a (statistically fluent) neurobiologist colleague recently told me that he just cannot comprehend how such a phenomenon could occur. Its impact, via the more encompassing framework of hierarchical modeling, is tremendous. Yet its occurrence depends on the choice of loss function.

Cox’s proportional hazards model is another unexpected finding: by using only the ranking information in the data, and hence a partial likelihood, one can eliminate entirely an infinite dimensional nuisance parameter, i.e., the baseline hazard. It is this work of Sir David that won him the inaugural IPS, and it is truly deserving by any measure. Practically, it has been applied virtually in every field requiring quantitative investigations of the risk factors in survival time. Academically, it opened up a new area of theoretical and methodological research, including on its limitations and generalizations (e.g., when the hazard is not proportional).

Bootstrap “literally changed my life,” as declared by my neurobiologist colleague, and it certainly has made many researchers’ lives much easier. Yet those who attended Efron’s seminar at Stanford announcing it still recall how skeptical the audience was: “No one believed it, as it was just too good to be true,” as one of them told me. And such skepticism was and still is healthy, because bootstrap does not always work. Indeed, Efron’s 1979 article on bootstrap has literally generated an industry of research on proving when it works, when it doesn’t, and how to make it work when its vanilla version fails. Intriguingly, the topic became so popular that for a while my thesis adviser Donald Rubin was more known in some circles for his paper on Bayesian bootstrap, than for his far more influential (earlier) work on missing data, causal inference, etc.

Incidentally, as I conveyed to Don, among his many contributions, I always regard his work with my eldest academic brother Paul Rosenbaum on propensity score matching (PSM) most unexpected. Controlling confounding factors in observational and other studies is of paramount importance, and matching methods are both intuitive and easy to implement. A common challenge with matching methods is that one quickly runs out of sample sizes as one tries to eliminate as many confounding factors as possible. The ingenuity of PSM is that we only need to match on one index, the propensity score, which has led to its enormous popularity. Of course, there is no free lunch here. Not only does the method require modeling assumptions, but also it cannot (directly) control for unmeasured confounding factors.

This leads to the most NP-worthy idea on my list, randomization. It controls for all confounding factors, known, unknown, and unknown-unknown. A simple random sample of 400 can easily produce the same mean squared error as a self-reported data set covering half of US population, that is, about 160,000,000, with a seemingly negligible self-selection bias; see proof in my recent RSS presentation at https://www.youtube.com/watch?v=8YLdIDOMEZs (with apologies to those who hate self-referencing). Of course, the limitation of randomization is that often it is an unachievable dream.

“Xiao-Li, your talk is dangerous,” said a friend who was worried that I might have hurt many people’s egos for omitting their NP-worthy ideas. But I’d summarize these six ideas by a different d-word: deceptive. At the first glance, all six appear to be too good to be true or too simple to be useful. Yet years of research and applications have demonstrated that they are incredibly powerful statistical (IPS) ideas, ideas we all wish to bear our names.

So what’s your IPS idea and/or IPS list?

Leave a comment below, or email bulletin@imstat.org.

Leave a comment

*

Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org
Latest Issue