Dec 19, 2012

Terence’s Stuff: n vs n-1

“Why is the denominator in the sample mean n, but the denominator for the sample variance is n−1?” a reader asked me. My answer needs to be comprehensible to his grand-daughter, who we can safely say is not doing an advanced degree in statistics at an institution of higher learning. All of us have had to answer this question at some time in our careers, either for our students or for ourselves. How do you answer it, and how helpful is your answer? Do you feel obliged to introduce distinctions such as populations vs samples, description vs inference, parameters vs statistics, Greek vs Roman letters? Or more advanced concepts, such as degrees of freedom, dimensions of subspaces, unbiasedness or maximum likelihood? Or do you think we should just use n as the divisor in the sample variance and move on, perhaps with a footnote stating that half the world uses n, and the other half uses n−1, while a couple of people with PhDs in statistics from Berkeley use n+1?

In the old days, when we wanted a variety of approaches to answering a question like this, we’d leaf through a selection of introductory texts, and fix on the answer we like best. These days we may not need to leave our desk to carry out this task. We can search the web, we can often LOOK INSIDE texts, and find the answer we like, at any desired level. Or can we? I must confess that I have never found an answer I liked to the “n vs n−1” distinction, not a simple, intuitive, but correct explanation, that makes sense to students at all levels. There are some good tries out there, but none that I find entirely satisfactory. I encourage you to look.

Following my introduction to statistics over fifty years ago, I noticed that from time to time, my teachers seem to lose it, and us, and “go off with the fairies”. Those who insist on clarifying the distinction of my title hit this very early on. They want to introduce the familiar $s^2$, and they want to do it right. If the price to pay for this is that we must leave the world of rational thought, so be it, they reason. In her lovely 1940 paper on degrees of freedom (d.f.) cited in the excellent Wikipedia article on the same topic, Helen M Walker (1891–1983) wrote, “this concept often seems almost mystical, with no practical meaning.” Sadly familiar to so many of us.

Can we look to history for insight on this matter? Readers of Walker’s historical review of d.f. will find little help for their pedagogical task. Gauss clearly understood the notion, but then we probably had to wait until “Student” (1908) and of course R.A. Fisher for further clarification, while Karl Pearson was famously not so clear on the concept. This is not stuff for intro courses. What we can learn from history is that people have been arguing about ways of presenting the n vs n−1 distinction for many decades now. On this point, I’d be happy to offer a small cash prize for the earliest reference in the statistics literature to my title. (Exactly how I will decide who wins, so that I can award the prize, I leave for another time.) Certainly the education and psychology literature has several excellent contributions to our topic, as they should, for they have been inflicting our subject on their students for nearly a century now. There was a valuable burst of activity in the American Educational Research Journal forty years ago, and doubtless there have been many similar exchanges at other times and in other places. Do you think a clear winner has emerged? I don’t.

Can we look to statistical theory to help in our explanation of the use of n−1? If we want to achieve unbiasedness—of our estimate of $σ^2$ but not of our estimate of $σ$ — then we can justify the n−1. That’s not too hard to explain, but is it worth the effort? If we are willing to introduce maximum likelihood estimation (under normality), we can justify the n, but that’s even more effort, and, I think, beyond my reader’s grand-daughter. We can even justify n+1 if we seek a minimum mean square error estimate of $σ^2$ (within a certain class). My conclusion is that at best, invoking theory leads to a draw between n and n−1. You pays yer money, and you takes yer choice.

I can’t see any real problem with introductory courses using the divisor of n for the sample variance. My reader wrote, “…the use of n instead of n−1 would make one of my grandchildren happy.” Me too!

Aaaand in the red cornerrrr…

5 Comments

  • [...] Professor Terry Speed is head of the Bioinformatics Division of the Walter & Eliza Hall Institute of Medical Research (WEHI). Originally trained in mathematics and statistics, he has had a lifelong interest in genetics. Together with his students and colleagues, Terry has developed methods of analysis now in daily use in research laboratories worldwide underpinning many of the recent advances in medical research. This work has helped to identify areas of the human genome that contribute to cancer, genes that are vital for embryonic development and pinpointing malaria proteins responsible for initiating infection in human red blood cells. He is a Fellow of IMS and the Australian Academy of Science, was awarded the NHMRC Achievement Award for Excellence in Health and Medical Research in 2007 and an Australian Fellowship in 2009. Most recently he was presented with the 2012 Thomson Reuter’s Citation Award. According to his colleagues, he is a living Australian treasure. Terry’s column ["n vs n-1"] is here. [...]

  • [...] is the denominator in the sample mean n, but the denominator for the sample variance is n−1?” http://bulletin.imstat.org/… via [...]

  • [...] 为啥样本方差的分母是n-1?这个看似简单的问题,你确定你能解释得清楚吗?伯克利大神Terry Speed说自己从来没有想到过一个能让所有学生都明白的答案。所以大神要有奖征集最早讨论这个问题的统计文献! [...]

  • [...] 为啥样本方差的分母是n-1?这个看似简单的问题,你确定你能解释得清楚吗?伯克利大神Terry Speed说自己从来没有想到过一个能让所有学生都明白的答案。所以大神要有奖征集最早讨论这个问题的统计文献! [...]

  • A bit late to this, but I’d bookmarked this post a few weeks ago and am just now getting around to reading it.

    Lior Pachter has an excellent blog post on this topic that I’ve found to make for wonderful reading. You can find the blog post here:

    http://liorpachter.wordpress.com/2014/05/25/bessels-correction-and-the-dangers-of-moocs/

Leave a comment

*

Welcome!

Welcome to the new and improved IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

With this new blog website, we are introducing a new feature, the Open Forum. Any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (send this to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org