Oct 6, 2016

XL-Files. Statistics vs Data Science: a 30-year-old prediction?

Xiao-Li Meng writes:

Writing the last XL- Files on “Peter Hall of Fame” reminded me of a piece that I have wanted to write since attending Chin Long Chiang’s memorial workshop on November 15, 2014. Professor Chiang was a pioneer of biostatistics long before I survived a course on survival analysis. Thus I was honored when I was invited to provide a statistician’s perspective on a debate between Chiang and another pioneer of biostatistics, Marvin Zelen. The debate apparently started with Zelen (1983, Biometrics), in a piece titled “Biostatistical Science as a Discipline: A Look into the Future,” whose abstract begins: “The field of biostatistics is enjoying unparalleled developments. Never before have members of our profession been in such demand. Current applications are significantly influencing the direction of research in statistical methodology. It is not clear whether there is a discipline which can be termed ‘biostatistics,’ but we are part of the emergence of a discipline which is termed ‘biostatistical science’. It refers to the applications of statistics, probability, computing and mathematics to the life sciences, with the goal of advancing our knowledge of a subject-matter field in this area. This paper discusses the role of computing, some aspects of training, and future directions of biostatistical science.”

What strikes me most is the relevance of Zelen’s thoughts on biostatistics vs biostatistical science for today’s discussion of statistics vs. data science. His description of biostatistical science could easily serve as one for data science, save for its restriction to life science. His question regarding the disciplinary identity of biostatistics within biostatistical science parallels the current question of whether statistics will survive as a viable discipline, given the emergence of the more encompassing discipline of data science.

Zelen suggested that the term biostatistics or biometrics “refers to a collection of statistical techniques which are primarily used in applications to the biological and biomedical sciences. … However, a discipline is not a collection of techniques.” But what is a discipline?

In his discussion, Bernard Greenberg listed three criteria for being a discipline: there must be a body of knowledge; it must be transmissible via educational methods; and it must undergo constant changes as a result of research performed by persons identified as its members. For Greenberg, if biostatistics was not a discipline, additional criteria would have to be articulated. Although Zelen did not directly respond to Greenberg’s challenge, he was clear that the key difference between biostatistics and biostatistical science was that the latter places far more emphasis and training on computing and substantive scientific knowledge. Biostatistics, then, was implicitly not a viable discipline because its “body of knowledge” was not sufficiently broad.

In his commentary “What is Biostatistics?” (1985, Biometrics), Chiang defined, and defended, biostatistics as “a discipline that is concerned with the development and application of statistical theory and methods for the study of phenomena arising in the life sciences.” Chiang reasoned that biostatistics was well qualified to be a discipline after 1950 because of “the amount and quality of knowledge that has been developed and accumulated in the field,” and because, “Since then graduates with strong backgrounds in mathematical statistics and mathematics have entered the field and treated biostatistical topics with a different attitude.” For Chiang, biostatistics possessed depth; for Zelen, biostatistics lacked breadth.

Perhaps the sharpest difference between Chiang and Zelen lies in their predictions of the future. Chiang predicted that “theoretical development, not statistical software, will be the centerpiece of biostatistics” and that “the future of biostatistics lies in the direction of stochastic processes.” Chiang believed that Zelen had overemphasized the role of computing and statistical software, remarking that, “His misplacement of emphasis made him feel insecure when he realized ‘the computer will become an intelligent data analyst’ in less than 10 years. The ‘computer data analyst’ may come sooner than he thinks. But biostatistics will continue to flourish and biostatisticians will not be out of a job.”

Zelen, however, considered Chiang’s emphasis on theoretical model building to be “totally naive unless one takes a serious interest in the subject matter and the appropriate data.” Zelen went on to conclude that, “Time will tell whether computing or stochastic processes will dominate biostatistics or biostatistical science. However, one need not go too far to verify that nearly all Departments of Biostatistics are currently adding computing courses in their curricula. We have a revolution in our midst. Why should one deny it!”

No one today is denying the revolution in our midst, and nearly all Departments of Statistics are currently adding computing courses in their curricula. Zelen’s prediction is spot on beyond biostatistics, thanks to the two Vs of Big Data—volume and velocity. We need more computing, and we need to compute fast. But Chiang’s prediction captures the third V of Big Data, variety, which demands more sophisticated stochastic temporal-spatial models, network models, etc, as well as newer and deeper theory. Chiang was also correct that as long as we deepen our foundations while expanding our horizons, (bio)statistics will continue to flourish and (bio)statisticians will not be out of a job.

Marvin Zelen passed away on the day of Chiang’s memorial workshop. A sad coincidence, or the reunion of two visionary scholars, whose collective predictions capture the very essence of what we experience today and, likely, for generations to come?

Share

Leave a comment

*

Share

Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org
Latest Issue