Feb 19, 2016

Letter to the Editor

A Commentary on “The Kids Are Alright: Divide by n when estimating variance,” by Jeffrey S. Rosenthal, IMS Bulletin (December 2015), Vol. 44, No. 8, Page 9

Dear Editor

Professor Rosenthal’s piece is persuasive and very clearly written. I thank Professor Rosenthal for taking us back to this old concern that never truly goes away. Indeed the basic issue under consideration appears and reappears when one teaches a cohort of new students.

With nearly 40 years of teaching experience now, I have a different, but easy, way to explain why the divisor in the customary sample variance is suddenly $n − 1$ instead of $n$. It is my understanding that there are readers out there who may happen to like my simple persuasion, below, in favor of a traditional divisor $n − 1$.

Suppose that I have $n$ random samples $X_1, \cdots, X_n$ from a single population with a population mean $\mu$. Customarily, in many elementary courses, I propose that $\mu$ is estimated by the sample mean, $\bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i$. Here, the divisor is $n$ and no one really objects to that idea.

Then comes the idea of variation around $\mu$. First, I explain why no-one considers $E[X−\mu]$ as a quantification of variation. An explanation is simple: $E[X−\mu] = 0$ under the population distribution. In other words, the errors in over-estimation and under-estimation of $\mu$ by $\bar{X}$ cancel out.

Thus, many proceed to the next step.

Define a population variation or variance as $\sigma^2$ given by $E[(X−\mu)^2]$, which will be positive unless all observations coincide with $\mu$ (with probability 1). After all, who wants to collect data where every data point is the same, and waste time and money!

So, how should one estimate $\sigma^2$? Well, I begin with $\sum_{i=1}^{n}(X_i – \bar{X})^2$. But I note that $\sum_{i=1}^{n}(X_i – \bar{X})$ is identically zero for any set of $n$ numbers. That is, among $n$ numbers (residuals) $X_1 − \bar{X}, X_2 − \bar{X}, … , X_n − \bar{X}$, we have exactly $n − 1$ free-riding numbers, since all $n$ residuals add up to zero. That is, the remaining $n$th number is fully determined by the other $n − 1$ free-riding numbers. Thus, while one obtains the sample variance, one divides $\sum_{i=1}^{n} (X_i – \bar{X})^2$ by $(n-1)$ instead of $n$. In this sense, $n – 1$ is customarily called the “degree of freedom,” that is, an indication of how many among $n$ residuals are truly free-riding.
In a first-year pre-calculus course that is often mandatory for all (or a large majority of) undergraduate students, the idea of pursuing mean square criterion (MSE) considerations never really convinces our first-year undergraduates since they had never heard of MSE prior to taking Stat 100 or Stat 110.

Especially for them, in order to have a painless discourse, I take a very small set of numbers, say, 3, 4, 2, 4, 2 with $n$ = 5. Obviously, $\bar{x} = 3$ and

$\sum_{i=1}^{n}(x_i – \bar{x}) = 0 + 1 – 1 + 1 – 1 = 0$

but

$\sum_{i=1}^{n}(x_i – \bar{x})^2 = 0 + 1 + 1 + 1 + 1 = 4.$

Thus, the sample variance should be the customary

$s^2 = \frac{1}{4} \sum_{i=1}^{n} (x_i – \bar{x})^2 = 1$.

The divisor is 4 instead of 5 because 4 is the “degree of freedom” as explained.

Nitis Mukhopadhyay
Professor of Statistics
University of Connecticut, Storrs, USA

Have you got something to say about statistics, probability, or maybe something you’ve read in the Bulletin? Send your letter to the Editor to bulletin@imstat.org.

Share

Leave a comment

*

Share

Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org
Latest Issue