# Terence’s Stuff: Multiple Linear Regression, 1

It’s time to respond to: *I’m curious about what you tell PhD students about multiple linear regression. *I tend to focus first on regression coefficients: what they are and are not, why we might care, and how we compute them. Almost fifty years ago, I was lucky enough to be introduced to Yule’s *new system of notation,* new in 1907, that is. (Thank you, Dr Geoffrey Jowett.) Given a collection *X*_{1}, *X*_{2}, … , *X _{p} *of random variables, the expression

*b*

_{12•3…p}denotes the (linear least-squares) regression coefficient of

*X*

_{1}on

*X*

_{2}, when

*X*

_{3}, … ,

*X*are also in the regression equation. As Yule put it in his paper,

_{p}*the first subscript gives the dependent variable, the second the variable of which the given regression is the coefficient, and the subscripts after the period show the remaining independent variables which enter into the equation.*This avoids having to emphasize that the regression coefficient of

*X*

_{1}on

*X*

_{2}depends on the other variables in the equation: it’s right there in the notation! Mosteller and Tukey say it another way in chapter 13,

*Woes of regression coefficients*, of their magnificent 1977 book: “a coefficient in a multiple regression – either in a theory or in a fit – depends on MORE than just: the set of data and the method of fitting [and] the carrier it multiplies. It also depends on: what else is offered as part of the fit.”

Having got this point clear, we now need to address the vexed question of how we interpret *b*_{12•3}, that is, the words we use when we say informally what it means. As we all know, some people call it the regression coefficient of *X*_{1} on *X*_{2}, *controlling for X*_{3}. But we also know that in general *X*‘s in regressions are *not *under any control, so this cannot be a good description. My preference is to say *adjusting for X*_{3}. This is vague, but less likely to mislead, and definitely conveys the fact that *X*_{3} is in the model along with *X*_{2}. It is also connected to the use of regression for linear adjustment. But what exactly *is* a regression coefficient? Again we all know the simplistic interpretation of *b*_{12•3} as the average change in *X*_{1} per unit change in *X*_{2}, when *X*_{3} is *held fixed*. Why simplistic? At times “held fixed” makes no sense, an example being *X*_{3} = *X*_{2}^{2}.

What *can *we say? A lengthy, but basically correct, interpretation goes like this: *b*_{12•3} tells us how X_{1} responds, on average, to change in X_{2}, after allowing for simultaneous linear change in X_{3} in the data at hand.

Mosteller and Tukey point out that sometimes *X*‘s *can* be held constant, and then the important thing is to recognize just how large the difference can be between (i) *X*_{2} changing while *X*_{3} is not otherwise disturbed or clamped, and (ii) changing *X*_{2} while holding *X*_{3} fast. The first corresponds to the interpretation I gave, and the second is what people usually wish for. Complicated? Indeed, but as Oscar Wilde told us, “The truth is rarely pure and *never simple*.”

Yule also introduced the notation *X*_{1•23…p}* = X*_{1} − *b*_{12•3…p}*X*_{2} − …*b*_{1p•1…p−1}*X _{p}*. This can be very helpful when we want to show that multiple linear regression may be viewed as a sequence of simple linear regressions, of residuals on residuals. It is closely related to

*added variable plots*. I think it’s important for students to know this, and how to derive it using the fact that (least-squares) residuals are orthogonal to all the variables after the period. For example, one can easily derive the identity

*b*

_{12•3}=

*b*

_{12}−

*b*

_{13•2}

*b*

_{32}, which I have found extremely useful over the years. Here’s one thing you can see from this identity: the regression coefficient of

*X*

_{1}on

*X*

_{2}doesn’t change when

*X*

_{3}is added into the regression equation, if either

*b*

_{32}= 0, i.e., if

*X*and

_{2}*X*are orthogonal, or

_{3}*b*

_{13•2}= 0. Another is the relation between adjusted and unadjusted means in ANCOVA. These identities are not hard to understand if you learn them when you are doing all your multiple regression computations with a mechanical calculator. Jowett showed us that if we use Jordan’s procedure for matrix inversion, “every intermediate quantity occurring in the calculation is either a partial regression coefficient or a partial covariance, and therefore of potential interest.” Try this step-by-step in

*R*.

In a sense, our problems in interpreting regression coefficients are consequences of their simplicity when (*X _{1}, X_{2}, … , X_{p}*) are jointly normally distributed. In that case, everything works out so beautifully that we are seduced into thinking it applies more generally. But it doesn’t.

Next column: it’s why and how.

## 3 Comments

## Leave a comment

## Welcome!

## What is “Open Forum”?

## Categories

- Anirban's Angle
- From the Editor
- Hadley Wickham
- Hand writing
- IMS awards
- IMS news
- Journal news
- Lectures and Addresses
- Letters
- Meetings
- Member news
- Nominations
- Obituary
- Open Forum
- Opinion
- Other news
- Rick's Ramblings
- Robert Adler
- Statistics2013
- Stéphane Boucheron
- Student Puzzle Corner
- Terence's Stuff
- Vlada's Point
- Welcome
- XL Files

Terence’s Stuff: Multiple Linear Regression, part 2 « IMS BulletinSeptember 6, 2012 at 12:40 pm[...] does it to my liking? I mentioned Mosteller & Tukey in my last piece on this topic, and once again I’m happy to say that they do a fine job on the different questions that lead us [...]

Understanding regression models and regression coefficients « Statistical Modeling, Causal Inference, and Social ScienceJanuary 5, 2013 at 2:43 pm[...] connection with partial correlation and partial regression, Terry Speed’s column in the August IMS Bulletin (attached) is [...]

College PostsJuly 18, 2013 at 10:56 pmI every time spent my half an hour to read this weblog’s articles all the time along with a mug of coffee.