Xihong Lin is the Henry Pickering Walcott Professor of Biostatistics, professor of statistics, and coordinating director of the Program in Quantitative Genomics at Harvard T.H. Chan School of Public Health. She was elected for her “contributions to statistics, genetics, epidemiology, and environmental health through influential and ingenious research in statistical methods and applications in whole-genome sequencing association studies, gene-environment, integrative analysis, and complex observational studies.”

“This distinguished and diverse class of new members is a truly remarkable set of scholars and leaders whose impressive work has advanced science, improved health, and made the world a better place for everyone,” said NAM President Victor J. Dzau. “Their expertise in science, medicine, health, and policy in the U.S. and around the globe will help our organization address today’s most pressing health challenges and inform the future of health and health care. It is my privilege to welcome these esteemed individuals to the National Academy of Medicine.”

New members are elected by current members through a process that recognizes individuals who have made major contributions to the advancement of the medical sciences, health care, and public health.

—

Also elected to NAM from the statistics community were the following people:

**Francesca Dominici**, Clarence James Gamble Professor of Biostatistics, Population, and Data Science at Harvard T.H. Chan School of Public Health, and co-director of the Harvard Data Science Initiative, for *“developing and applying innovative statistical methods to understanding and reducing the impact of air pollution on population health.”*

**John P.A. Ioannidis**, C.F. Rehnborg Professor in Disease Prevention, professor of medicine, health research and policy, biomedical data science, and statistics, and co-director of Meta-Research Innovation Center at Stanford University, for *“his dedication to rigorous, reproducible, and transparent health science, for his seminal work on meta-research, for his calls for quality in evidence, and for the positive impact it has had on the reliability and utility of scientific information throughout the sciences.”*

**Bradley A. Malin**, professor and vice chair, biomedical informatics, and professor of biostatistics and computer science, Vanderbilt University, for *“contributions in natural language de-identification, guiding both national and international policies around research protection and enabling broad sharing and reuse of health and social data at an unprecedented scale.”*

“It is a privilege to be recognized by my peers and win such a well-respected award,” Gottardo said. “Researching ways to harness the immune system to prevent infections and cure cancer is a massive undertaking that involves analyzing and integrating a large amount of data, and I’m proud that my work is helping other scientists turn that trove of information into actionable insights.”

Gottardo’s work focuses on developing methods and tools to analyze large immunological data sets generated by novel assay technologies and helping scientists understand the results of their experiments.

“Dr. Gottardo has an outstanding ability to apply an integrated, reproducible and open approach to his research,” said Fred Hutch colleague and biostatistician Peter Gilbert. “I’ve had the pleasure of collaborating with Raphael on many HIV vaccine projects over the years and his fusion of computational immunology, computer science and statistical research is second to none.”

The Mortimer Spiegelman Award is named for demographer, actuary and biostatistician Mortimer Spiegelman and has been presented annually since 1970. See https://www.apha.org/apha-communities/member-sections/applied-public-health-statistics/who-we-are/awards

]]>Starting January 1, 2019, the ** Annals of Applied Probability **will now have two Co-editors (like the

At the * Annals of Applied Statistics, *Tilmann Gneiting hands over to

The ** Annals of Statistics **Co-editors Ed George and Tailen Hsing are also ending their term. The new Co-Editors are

As well as these new editors, **Domenico Marinucci **has agreed to serve for a second term as Editor of the * Electronic Journal of Statistics, *as recommended by the joint IMS/Bernoulli Society Committee to Select Editors. His term will be until the end of 2021. Domenico’s web page is https://www.mat.uniroma2.it/~marinucc/

Thank you to everyone who serves our community in this way!

]]>Four COPSS awards will be presented at the 2019 JSM in Denver, Colorado, which will take place July 27–August 1, 2019. The deadlines for the Fisher Award and Lectureship and the Florence N. David Award and Lectureship have passed, but there is still time to nominate for the **Presidents’ Award** and the **George W. Snedecor Award**. Nominations for these awards should be submitted by **January 15, 2019**, to the relevant committee chair or to the COPSS Secretary.

For more information and contact details, please visit http://copss.org.

]]>I thought it would be revealing and entertaining to look at the recently concluded US midterm election results and make some conclusions. Who is voting Republican? What is, really, the Republican base? And what about the Democrats? Are the bases entirely disjoint? Is the base more scattered for one of the parties compared to the other? Are the midterm voters of each party essentially the same as the voters who voted for them in the 2016 Presidential election? Or, are the two parties growing and capturing new voters? We will see…

We will also give a probability based on a stated model that the sitting US President would be reelected if elections are held now. First, the actual 2018 midterm data, which is presented in table 1.

*Table 1: voting data for 2018 US midterm elections*

Group |
Group size |
Republican % |
Democrat % |

All voters | 44% | 50.5% | |

All males | 48% | 51% | 47% |

All females | 52% | 40% | 59% |

White men | 35% | 60% | 39% |

White women | 37% | 49% | 49% |

White men no college | 20% | 66% | 32% |

White men college | 15% | 51% | 47% |

White women no college | 21% | 56% | 42% |

White women college | 16% | 39% | 60% |

Nonwhites no college | 18% | 22% | 76% |

Nonwhites college | 10% | 22% | 77% |

All 18-29 age | 13% | 32% | 67% |

All 45-64 | 39% | 50% | 49% |

All whites | 72% | 54% | 44% |

Blacks | 11% | 9% | 90% |

Latinos | 11% | 29% | 69% |

Asians | 3% | 23% | 77% |

Black men | 5% | 12% | 88% |

Black women | 6% | 7% | 92% |

Gun owners | 46% | 61% | 36% |

Non gun owners | 53% | 26% | 72 % |

Protestants | 25% | 61% | 38% |

Catholics | 26% | 49% | 50% |

White evangelicals | 26% | 75% | 22% |

Jewish | 2% | 17% | 79% |

Married | 59% | 47% | 51% |

Unmarried | 41% | 37% | 61% |

Independents | 30% | 42 % | 54% |

Trump strongly like | 31% | 94% | 5% |

Trump like some | 14% | 74% | 24% |

Trump dislike some | 8% | 34% | 63% |

Trump strongly dislike | 46% | 4% | 95% |

Health care main issue | 41% | 23% | 75% |

Immigration main | 23% | 75% | 23% |

Economy main | 22% | 63% | 34% |

LGBT | 6% | 17% | 82% |

Urban | 32% | 32% | 65% |

Suburban | 51% | 49% | 49% |

Rural | 17% | 57% | 42% |

In view of these demographic voting percentages, by an elementary application of Bayes’ theorem, we can identify subgroups of the US population that can be called the bases of the two political parties. White men form 35% of the total population, but form 48% of the Republican vote. Evangelical Christians form 26% of the total population, and yet form 44% of the Republican vote. Gun owners also form 44% of the Republican vote, and about 20% of the Republican vote are rural voters.

In contrast, women form 52% of the total population, but form 61% of the Democratic vote. White women with a college degree and blacks each form 20% of the Democratic vote, millennials form 17%, Latinos 15%, black women alone 11%, LGBT voters 10%, and urban and suburban voters form a whopping 86% of the Democratic vote. The Democratic base seems to consist of a larger number of smaller subgroups than two or three large dominating groups. The Democratic base is more uniform. It is useful to present the bases in a tabular form (see table 2 and table 3, below).

*Table 2: Republican base, data for 2018 US midterm elections*

Republican Base |
% of population |
% of Rep. vote |

White men | 35% | 48% |

White men no college | 20% | 30% |

Evangelicals | 26% | 44% |

Gun owners | 46% | 44% |

Rural voters | 17% | 20% |

*Table 3: Democrat base, data for 2018 US midterm elections*

Democratic Base |
% of population |
% of Dem. vote |

Women | 52% | 61% |

Women with college degree | 16% | 20% |

Blacks | 11% | 20% |

Millennials | 7% | 17% |

Latinos | 11% | 15% |

Black women | 6% | 11% |

LGBT voters | 6% | 10% |

We can also deduce from the midterm results and Bayes’ theorem that 83% of the Republican voters in the midterm are those that voted for the Republican nominee in the 2016 Presidential election; and 80% of the Democratic voters in the midterm are those that voted for the Democratic nominee in the 2016 Presidential election. Both parties have attracted some new voters. Political activity in the coming months will no doubt see the two parties protect and defend interests of their respective bases.

It is always a seductive idea to try to predict the future, especially for a loaded question such as, “Will the sitting US President be re-elected?” We give some sort of a probability for it based on a set of assumptions, which are that:

(a) The probability is for re-election if elections are held now.

(b) It is assumed that whether or not the sitting President wins the electoral college votes in a given state is determined by his approval rating in that state.

(c) If the approval rating is 50% or more, it is assumed that he is guaranteed to win that state, and if the approval rating is 45% or less, it is assumed that he will lose that state. If the approval rating, say $α$, is between 0.45 and 0.5, we model the probability that he will win that state as $20 × (α − 0.45).$

(d) Based on the approval ratings in each state at the 2018 midterm elections, there are then only four states that are in play: Arizona, Nevada, North Carolina, and Wisconsin. The sitting Presidents’ winning probability in these states are respectively 0.75, 0.5, 0.75, 0.5, using the formula in (c). These states carry 11, 6, 15, 10 electoral college votes, respectively. So, plainly, the probability of carrying all four states is at most 0.5.

(e) If we let $µ$ denote the expected number of electoral colleges the sitting President will win from these four states, then, by using the obvious indicator variables, $µ = 27.5$.

(f) Of the 306 electoral colleges that the sitting President won in the 2016 election, he is assured to win in 228 at this point of time, and he is assured to lose in 42 of them (PA approval 45, MI approval 44, IA approval 43). Thus, to still win an electoral college majority of at least 270 electoral colleges, he must carry all four states in play listed above.

Therefore, the probability that the sitting President will win re-election if elections are held now is approximately 14%, under the model stated in (a)–(c), and mutual independence of voters in these four states. We can give a conservative bound without assuming this mutual independence, for, by Markov’s inequality, the probability that the sitting President will win all four of the above states is at most 2.5÷4 = 0.625. If we average the 14% number and the 62.5% number, we get 38.5%, and it is humorous to notice that this is alluringly close to the betting market probability (according to PredictIt) as I write this…

]]>The American Statistical Association (ASA) has a brand new History of Statistics Interest Group (HoSIG). Membership in ASA is not required to join. Anyone interested in the history of statistics is welcome and encouraged to become a member.

The objectives of HoSIG are to:

1. Bring together individuals and groups who have an active interest in the history of statistics.

2. Promote and support research into the history of statistics at all levels.

3. Further the use of the history of statistics in education.

4. Encourage the historical perspective among statisticians and related professionals.

5. Contribute to the program of the annual Joint Statistical Meetings and selected meetings of the ASA and other professional organizations.

Please let Michael know if you have any questions: email mpcohen@juno.com. You will find instructions about how to join at this link:

http://community.amstat.org/historyofstats/aboutus/join

Frank Rudolf Hampel, professor emeritus at ETH Zurich, passed away in Thalwil near Zurich, Switzerland, on October 2, at the age of 77.

Frank Hampel was well known for his fundamental contributions to robust statistics, in particular for the introduction of the basic concepts of influence function and breakdown point. The influence function—“perhaps the most useful heuristic tool of robust statistics,” according to Peter Huber (*Robust Statistics*, Wiley 1981, pp.13–14)—describes the approximate effect on an estimate when inserting, deleting or modifying a single observation. Moreover, the asymptotic variance of an estimator is given by the expected value of the squared influence function. This connection allowed Frank to formulate and solve a central optimality problem in robust statistics, namely to minimize the asymptotic variance under a bound on the influence of a single observation (“Lemma 5” in his thesis). In contrast to the infinitesimal description provided by the influence function, the breakdown point is a global measure that gives the largest percentage of arbitrary bad observations an estimator can tolerate without diverging. His book *Robust statistics: The approach based on the influence function*, written together with Elvezio Ronchetti, Peter Rousseeuw and Werner Stahel (Wiley, 1986), contains a systematic exposition of the area. It served as a key reference for more than two decades and was highly influential.

In addition to deviations from an assumed marginal distribution, Frank also considered deviations from independence, advocating the use of long-range dependence models as the most relevant type of unsuspected dependence. Another important contribution by Frank is what he called “small sample asymptotics”, a variant of saddle-point approximations for the distribution of estimators, based on a different derivation. They provide an excellent agreement with the exact distribution even for very small samples.

In his later years, Frank focused on the philosophical foundations of statistics. He argued for describing epistemic uncertainty by upper and lower probabilities, corresponding to one-sided bets. In his approach, total ignorance about an event means that one refuses to bet on either the event or its complement. It remains to be seen if these ideas will be recognized in the future as a fundamental new approach.

Frank grew up in Germany during World War II; his father died when he was one year old. His mother then moved to the house of his grandfather in Upper Silesia. Because this region became Polish at the end of the war, the family was forced to leave and ended up near Göttingen. After high school, Frank studied physics, mathematics and philosophy in Munich and Göttingen. His professor in Göttingen, Konrad Jacobs, who worked in ergodic theory, showed him the seminal 1964 *Annals of Mathematical Statistics* paper by Peter Huber, and encouraged him to go to Berkeley with a one-year exchange scholarship. He decided to stay there and completed his PhD in 1968. Officially, Erich Lehmann was his advisor, but Erich wrote in *Reminiscences of a Statistician *(Springer, 2008, p.158) that “…in fact I had essentially no input. My ‘contribution’ consisted of my immediate realization of the importance and maturity of this work … and my task was to encourage, smooth the process and otherwise stay out of the way.” After his PhD, Frank accepted an offer by Volker Strassen (famous for proving an invariance principle for the law of the iterated logarithm) to move with him from Berkeley to the University of Zurich and to take a position as “Oberassistent”, being in charge of the statistical consulting service. In 1970–71, Frank was invited together with Peter Bickel and Peter Huber to join John Tukey during the “Princeton robustness year”, which had a big impact on the further development of robust statistics. In 1974, he was elected as associate professor at ETH Zurich, thus becoming a colleague of Peter Huber. He was soon promoted to full professor and stayed at ETH until his retirement in 2006. In 2007 he received an honorary doctorate from the University of Dortmund for his “scientific achievements in the area of modern statistics and data analysis.”

Besides statistics, Frank had a keen interest in, and profound knowledge of, nature, in particular astronomy, birds, orchids and dragonflies. No road was too far and no search too laborious if he could find and observe a species he had never seen before. He was very happy and patient to share his knowledge and enthusiasm with others. Frank was an independent thinker who had a great influence on many statisticians with his original ideas, and at the same time was a very kind person. He is survived by his wife, Verena.

*—*

*Written by Hans R. Künsch, ETH Zurich*

Data ethics seem to be the flavour of the month. In the UK alone, the establishment of the National Statistician’s Data Ethics Advisory Committee has been quickly followed by the Government Department of Digital, Culture, Media and Sport launching its Data Ethics and Innovation Centre, and the Nuffield Foundation launching its Ada Lovelace Centre, aimed at taking “a lead on the interaction between data, ethics, and artificial intelligence in the UK”. And there’s nothing unique about the UK in this—a quick Google search shows a proliferation of such bodies with, for example, the Council for Big Data, Ethics, and Society being established in the US in 2014, aimed at providing “critical social and cultural perspectives on big data initiatives”. Indeed, it is not even limited to governments: corporations and other bodies are also concerned that their use of data, which is often central to their business model, should be ethically sound, not least to avoid the risk of public backlash and possibly highly restrictive legislation

Of course, statisticians have long been aware of the ethical dimensions of their work, though usually these were manifest through particular application domains, such as a requirement to include statisticians on medical ethics committees, or the requirement to be able to explain an adverse decision in the context of consumer loans. Professional bodies of statisticians, such as the ASA and RSS, have long had systems of ethical guidelines, as have other organisations for which data are central (e.g. the ACM).

But more recently, recognition of the need for such ethical oversight has grown, mainly because of raised awareness of the potential and pervasiveness of big data, data science, and artificial intelligence. Attention has shifted, from rather specialised concerns for informed consent in clinical trials, the preservation of anonymity in survey work, avoiding prohibited variables in insurance decisions, and so on, to much more “in-your-face” issues. These are matters such as selection bias leading to racist decisions, chatbots being gratuitously offensive, and questions of who is responsible when a driverless car crashes or a data theft leads to fraud.

Incidents like these occur for a variety of reasons. Automatic data collection leads to massive data sets accumulating without human oversight. Adaptive and self-learning algorithms go their own way (that’s the whole point, really). And the line between research and practice is becoming blurred in many contexts. Moreover, there is increasing tension between the data minimisation principle (that only sufficient data should be collected to answer the specific question) and the promise of data mining (that large data sets contain nuggets of great potential interest and value).

Resolutions of such tensions are not easy to arrive at, and solutions are complicated by the nature of public opinion—which is both heterogeneous and volatile. Different sections of the public, having had different experiences and been exposed to different circumstances, will have different views on what is right, legitimate, and proper. Worse still, those views will fluctuate with time—perhaps especially in response to events such as media reports of data losses or thefts, or fraud associated with advanced use of data.

Although sometimes described as *the new oil,* because of the way data, and data science, are revolutionising society just as fossil fuels did earlier, data have unique properties, leading to correspondingly unique ethical challenges. These properties will be very familiar to statisticians: data can be copied (as many times as you like), data can be sold or given away and yet simultaneously retained, data can be used multiple times for many different purposes, data can be of insufficient quality for some uses and yet perfectly adequate for other uses, and so on.

Such diverse applications and properties of data are compounded when data sets are linked, perhaps in unforeseen and indeed unforeseeable ways. A data set might even be linked to new data which did not exist at the time the first data set was collected. There are already plenty of examples where privacy has been breached through sophisticated linking exercises.

Ethical considerations cover the concept of personal data (this lies at the core of the EU’s General Data Protection Regulation), data ownership (is this a meaningful concept? Some regard data they have collected, possibly at great expense, as theirs, while others regard such data as belong to the person they describe), consent and purpose, privacy and confidentiality, the right to be forgotten, the right to access data, an awareness of new developments in data science technology, the views of the public, and trustworthiness

Such considerations do not permit simple formulaic answers, since these must be context-dependent and dynamic. Instead, solutions must be principles-based, with higher-level considerations guiding decisions in any particular context. These principles include that the data and their analysis should **serve the public good**, should be **transparent**, must be **non-discriminatory**, should be **trustworthy and honest**, should **protect individual identities**, and should **adhere to legal requirements**. Moreover, the world of data and data science is changing rapidly, as large data sets continue to accumulate, as new analytic tools continue to be developed, and as real-time and online processing becomes increasingly prevalent (for example, with the advent of the Internet of Things). This means that the principles must be regularly reviewed to see that they remain adequate.

In seeking to apply ethical principles, a delicate balance must be often be struck. Constraints on data science must not be so great that they stifle innovation and social progress, preventing statistics and data science from benefiting humanity. That would be just as unethical.

—

**Further reading**

European Data Protection Supervisor (2015) *Towards a New Digital Ethics: Data, Dignity, and Technology*, https://edps.europa.eu/sites/edp/files/publication/15-09-11_data_ethics_en.pdf

*Philosophical Transactions of the Royal Society*, Volume 374, Issue 2083, theme issue on *The Ethical Impact of Data Science.*

Hand D.J. (2018) Aspects of data ethics in a changing world: where are we now? *Big Data*, **6**, 176–190.

Metcalf J., Keller E.F., and Boyd D. (2016) *Perspectives on Big Data, Ethics, and Society*. The Council for Big Data, Ethics, and Society.

Zwitter A.Z. (2014) Big data ethics. *Big Data and Society*, July-December, 1–6.

A candidate for the **IMS Fellowship** [see Philip Protter’s article here]shall have demonstrated distinction in research in statistics or probability, by publication of independent work of merit. This qualification may be partly or wholly waived in the case of either a candidate of well-established leadership whose contributions to the field of statistics or probability other than original research shall be judged of equal value; or a candidate of well-established leadership in the application of statistics or probability, whose work has contributed greatly to the utility of and the appreciation of these areas. Candidates for fellowship should be members of IMS when nominated (you can email Elyse Gustafson erg@imstat.org to check this before you start). The nomination deadline is January 31, 2019. For nomination requirements, see https://www.imstat.org/honored-ims-fellows/nominations-for-ims-fellow/.

Nominations are invited for the **Carver Medal**, created by the IMS in honor of Harry C. Carver, for exceptional service specifically to the IMS. All nominations must be received by February 1, 2019. Please visit https://www.imstat.org/ims-awards/harry-c-carver-medal/.

Applications are open for two types of travel awards. The **IMS Hannan Graduate Student Travel Award** funds travel and registration to attend (and possibly present a paper/poster at) an IMS sponsored or co-sponsored meeting. This travel award is available to IMS members who are graduate students (seeking a Masters or PhD degree) studying some area of statistical science or probability.

If you are a New Researcher (awarded your PhD in 2013–18) looking for travel funds, you should apply for the **IMS New Researcher Travel Award** to fund travel, and possibly other expenses, to present a paper or a poster at an IMS sponsored or co-sponsored meeting (apart from the IMS New Researcher’s Conference, which is funded separately).

Applicants for both these travel awards must be members of IMS, though joining at the time of application is allowed (student membership is free, and new graduate membership discounted!). The application deadline for both is February 1, 2019.

See https://www.imstat.org/ims-awards/ims-hannan-graduate-student-travel-award/ and https://www.imstat.org/ims-awards/ims-new-researcher-travel-award/ for details.

Anirban DasGupta says:

*The previous problem on inference based on the distribution of a nonsufficient statistic required the use of both Markov Chain theory and statistical inference [see solution below]. It was a problem on probability and statistics simultaneously. This month we pose a rather simple problem which should be fun to think about, and has many possible answers! So, hopefully, many of you will think of one of the correct answers.*

Let $C(\mu , 1)$ denote the Cauchy distribution on the real line with location parameter $\mu $ and scale parameter equal to one. Suppose $\mu $ belongs to $\mathcal{R}$ (the parameter space) and that we wish to estimate it under squared error loss function. Let $X_1, X_2, \cdots $ be an iid $C(\mu , 1)$ sequence. Assume that $n > 7$. Give, with proof, a sequence of estimators $T_n(X_1, X_2, \cdots , X_n)$ of $\mu $, such that:

(a) For every $n$, $T_n$ is inadmissible;

(b) For no $n$, $T_n$ is minimax;

(c) For every $n$, $T_n$ is unbiased;

(d) The sequence of estimators ${T_n}$ is asymptotically efficient.

(e) Compute the numerical value of the estimator you have proposed for the following data values:

0.1, 2.9, −0.6, 3.1, 3.6, −6.5, 0.2, 1.0, 2.4, −15.9.

Embed the longest run problem into a stationary Markov chain with the following transition matrix. Denote the observed longest head run in $n$ tosses of a $p$-coin by $L_n$ and suppose we wish to find $P(L_n \geq m), m$ a general nonnegative integer. You go to state zero from state $i$ with probability $1-p$ and go to state $i+1$ from state $i$ with probability $p$, with state $m$ as an absorbing state. Denote this $(m+1) \times (m+1)$ matrix by $P_m$ and let $Q[n,m]$ denote its $n$th power.

Then $P(L_n \geq m)$ is the last element in the zero-th row of $Q[n,m]$.

By evaluating $P(L_n \geq 1) – P(L_n \geq 2)$ with $n = 10$, one gets

\[ P(L_n = 1) = 10p-54p^2+128p^3-189p^4+216p^5-205p^6+144p^7-63p^8+14p^9-p^{10}. \]

It is uniquely maximized at $p \approx .1616$, which is the value of the MLE of $p$ based on $L_n$ alone.

A moment estimate is easily found by inverting the expectation formula

\[ E(L_n) \approx \frac{\log n}{\log \frac{1}{p}} – \frac{\log (1-p)}{\log p}. \]

An approximate solution is

\[ \hat{p} = n^{-1/L_n}. \]

This estimate will have a fairly serious bias problem. However, with work, we can derive (a high order) asymptotic expansion for the bias of $\hat{p}$. Hence, we can correct $\hat{p}$ for its bias, at least to the first order. These are very classic ideas in large sample theory of inference.

]]>**www.i****mstat.org/dues-and-journal-subscription-prices-for-members/**

With the bootstrap, scientists were able to learn from limited data in a simple way that enabled them to assess the uncertainty of their findings. In essence, it became possible to simulate a potentially infinite number of datasets from an original dataset, and in looking at the differences, measure the uncertainty of the result from the original data analysis.

Made possible by computing, the bootstrap powered a revolution that placed statistics at the center of scientific progress. It helped to propel statistics beyond techniques that relied on complex mathematical calculations or unreliable approximations, and hence it enabled scientists to assess the uncertainty of their results in more realistic and feasible ways.

“Because the bootstrap is easy for a computer to calculate and is applicable in an exceptionally wide range of situations, the method has found use in many fields of science, technology, medicine, and public affairs,” says Sir **David Cox**, inaugural winner of the International Prize in Statistics. Indeed, Cornell University and EPAM Systems Inc. examined research databases worldwide and found that since 1980, the bootstrap (and multiple variations on the term, such as bootstrapping) has been cited in over 200,000 documents in more than 200 journals worldwide. Citations are found in fields like agricultural research, biochemistry, computer science, engineering, immunology, mathematics, medicine, physics and astronomy, and the social sciences.

“While statistics offers no magic pill for quantitative scientific investigations, the bootstrap is the best statistical pain reliever ever produced,” says **Xiao-Li Meng**, Whipple V. N. Jones Professor of Statistics at Harvard University [and IMS President]. “It has saved countless scientists and researchers the headache of finding a way to assess uncertainty in complex problems by providing a simple and practical way to do so in many seemingly hopeless situations.”

“The bootstrap was a quantum leap in statistical methodology that has enabled researchers to improve the lives of people everywhere,” says **Sally Morton**, Dean of the College of Science and Professor of Statistics at Virginia Tech. “Efron is a statistical poet of enormous beauty, applicability and impact.”

Brad Efron will accept the prize next summer at the 2019 World Statistics Congress in Kuala Lumpur.

Efron is Max H. Stein Professor of Humanities and Sciences, Professor of Statistics, and Professor of Biostatistics with the Department of Biomedical Data Science in the School of Medicine; he serves as Co-director of the Mathematical and Computational Sciences Program. He has held visiting faculty appointments at Harvard, UC Berkeley, and Imperial College, London. A recipient of a 2005 National Medal of Science for his contributions to theoretical and applied statistics, especially the bootstrap sampling technique, in 2014 he was awarded the Guy Medal in Gold by the Royal Statistical Society. He served in 1988–89 as President of IMS, and in 2004 as President of the American Statistical Association.

Read his Stanford University profile.

Bradley Efron was born in May 1938 to Russian immigrants, and grew up in St. Paul, Minnesota. He credits his salesman father, Miles, for cultivating a love of math and science, in part through baseball and bowling scoring. “He kept track of these things,” says Efron, “so I grew up with a lot of numbers around me and that was very helpful—I was training to be a statistician without realizing it.” He won a national merit scholarship the year it was first introduced and went to Caltech. It was more than an intellectually eye-opening experience. “I’d never seen a mountain or an ocean,” says Efron.[1]

Initially, he thought he was going to become a mathematician, but he realized that abstract mathematics was not where his interests or talents lay. Enrolling in a PhD program at Stanford, he switched to statistics. “I remember when going into statistics that first year I thought ‘this will be pretty easy, I’ve dealt with math and that’s supposed to be hard.’ But statistics was much harder for me at the beginning than any other field. It took years before I felt really comfortable.”[2]

Statistics, by this point (the early 1950s), had been deeply immersed in decision theory—and the formal mathematical principles that explained inference. It was abstract, highly mathematical—and not especially concerned with applied problems. This was about to change under the synergistic influence of data analysis and computing. Statistics suddenly became deeply relevant to scientific research because it could answer questions previously unanswerable.

Statisticians grappled with the problems of outliers in datasets, limited data, and multiple unknowns, developing techniques through computers that were able to overcome the calculative limits of human thought. It was in this milieu that Efron was inspired by and built on the work of John Tukey, David Cox, and Rupert Miller to create the bootstrap.

The name was inspired by the 18^{th} century fictional character Baron Munchausen, and a variation of a story where he pulls himself out of a swamp by his own bootstraps. In a similar vein, the statistician or scientist could now use their own data to assess the uncertainty in their own data.[3] Initially, his paper on the bootstrap was rejected for publication because it didn’t have enough theorems—so he added some at the end.[4]

“The truth is I didn’t think it was anything wonderful when I did it,” says Efron. “But it was one of those lucky ideas that is better than it seems at first view.”[5] It was a tool that many scientists could use, and use easily, especially as personal computing provided the power to do the number crunching. And it worked.

**

*Footnotes*

[1] ASA interview with Efron, October 2018

[2] Bradley Efron: A Conversation with Good Friends, Susan Holmes, Carl Morris, Rob Tibshirani and Bradley Efron, *Statistical Science* Vol. 18, No. 2, Silver Anniversary of the Bootstrap (May, 2003), pp. 268-281

[3] A Life in Statistics: Bradley Efron, Julian Champkin, *Significance*, 18 November 2010

[4] Bradley Efron: A Conversation with Good Friends, Susan P. Holmes et al, *Statistical Science*, May 2003

[5] ASA interview with Efron, October 2018

Suppose you want to know the average household income in your city. You can’t afford a complete census so you randomly sample 100 households, record the 100 incomes, and take their average, say $29,308. That sounds very precise, but you would like some estimate of how accurate it really is. A straightforward, but impractical, approach would be to take several more random samples of 100 households, compute the average each time, and see how much the averages differed from each other.

The bootstrap lets you approximate this impractical approach using only the original sample’s data. A bootstrap data set is a random sample of size 100 drawn from the original 100 incomes. You can imagine writing each of the original incomes on a slip of paper, putting the slips in a hat, and randomly drawing a slip out. Record the number, put the slip back into the hat, and repeat this process 99 more times. The result would be a bootstrap data set, and we can make as many bootstrap data sets as we wish, each time taking their average. Let’s say we do 250 of them, giving 250 bootstrap averages. The variability of the 250 averages is the bootstrap estimate of accuracy for the original estimate $29,308.

The same idea can be applied to find the accuracy of any statistic, say the median income instead of the average or, perhaps, something much more complicated, which makes the bootstrap ideal for the often elaborate statistical methods of modern scientific practice.

The International Prize in Statistics recognizes a major achievement of an individual or team in the field of statistics and promotes understanding of the growing importance and diverse ways statistics, data analysis, probability and the understanding of uncertainty advance society, science, technology and human welfare. With a monetary award of $80,000 USD, it is given every other year by the International Prize in Statistics Foundation, which is comprised of representatives from the American Statistical Association, International Biometric Society, Institute of Mathematical Statistics, International Statistical Institute and Royal Statistical Society. Recipients are chosen from a selection committee comprised of world-renowned academicians and researchers and officially presented with the award at the World Statistics Congress.

]]>

After a six-year hiatus, this year we renewed the Distinguished Statistician Colloquium Series. With generous funding from Pfizer, the American Statistical Association, and the Department of Statistics at UConn, the 24th colloquium in the series was held on September 26–27, 2018 and featured Professor **Grace Wahba** from the University of Wisconsin–Madison.

Prof. Wahba is renowned for her work in statistical theory and the development of efficient numerical and statistical methods for large data sets, and has developed methods with applications in biostatistics, weather prediction, machine learning, climate science, and more. She is a member of the US National Academy of Sciences, and a Fellow of IMS, ASA, SIAM, the American Academy of Arts and Sciences and the American Association for the Advancement of Science.

Grace Wahba was interviewed by Dr. Hao Helen Zhang from the University of Arizona and Dr. Yoonkyung Lee from The Ohio State University.

The first day included a reception, a rehearsal colloquium and interview, and a banquet dinner. It was held at the Alumni Center, and was attended by many faculty members, Pfizer employees, and representatives from the New England Statistics Symposium (NESS). Introductions were given by Dr. Dipak Dey, UConn Board of Trustees Distinguished Professor of Statistics, and Dr. Kannan Natarajan, Head of Global Biometrics and Data Management at Pfizer. Dr. Xiao-Li Meng, Professor of Statistics at Harvard University and Past President of the New England Statistical Society, delivered an entertaining speech and a toast before dinner. The colloquium—*Pairwise Density Distances and Reproducing Kernel Hilbert Spaces, and an approach to treating personal densities as attributes in a Smoothing Spline ANOVA model*—and the interview were filmed on September 27 in UConn’s Dodd Research Center.

The videos will be added to the ASA YouTube channel in the near future.

]]>