Centering for interactions in multiple regression ??

#1
This is an e-mail about our discussion. Can someone please help us. I cannot understand any reason to center the variables other than "well it just should be that way" kinda thing. After my 2 questions you will read the latest email from my professor then my first email. I also forgot to mention we are predicting interactions.

So specifically my 2 questions are this:
1) When to center?...if you have a link that would be great. I have found several sources and I am getting conflicting info (do center...no need to center...or you must justify centering)
2) How do you plot interactions?


Here is the conversation between a professor and myself. I am making the argument against centering.

Professor,
I have looked into the question of whether or not to center. If I may,
I would like to explain why we should not concern ourselves with
centering the variables. These are not my concepts (a combination of
sources) and I am going at this as just cause to move forward rather
than backwards (you probably know this just cleaning up and proposing a
rationale).

First, the main reason for centering should be based on a theory
supported purpose. This became more evident as I read on. The idea of
centering is a correction for excessive multicollinearity. If variables
are thought to have substantial overlap then the resulting analysis
will not be sensitive to those individual variables' contributions in
the model. Thus by centering one can reduce (potentially) some
multicollinearity. However, I have 2 key observations to address any
concern of multicollinearity. The first, is that do we have any reason
to believe (in theory) that any of these variables are measures of the
same construct? I would say that attachment, ppn, relat. satis are not
the same but trust and attachment may be overlapping. Secondly, looking
purely at our output, all of our tolerance levels are well above the
cutoff level indicating high multicollinearity. What is the cutoff for
multicollinearity/tolerance? Well, I am glad you asked. I have found
researchers which use very liberal values of .10 and below and very
conservative values of .40. In terms of our tolerance values, all of
our tolerances are at least .90 + . I personally see no reason at all
to center our variables. It is not that I can't do it (it really would
be that much trouble) but I just don't find the need to do it. Also, I
took a look at the scatterplots and there appears to be great
heteroscedasticity.

Her response:
"I think the centering is particularly important when you include an
interaction term in the equation. This is not an issue of
multicollinearit, but an issue of standardizing your variables so you
can reasonably calculate a multiplicative function."

Am I right or am I wrong?
 
Last edited:
#3
I've heard both arguments, to avoid multicolinearity and to have a more interpretable beta in variables where the mean is more meaningful than zero. But I do not really understand how the latter works.
 
#4
Hi Danielkeeton,

Okay, here's the deal.

Your advisor is right.

Centering will only help multicollinearity if you are have an interaction with continuous IVs. The problem is that if you multiply two continuous variables that are both on a positive scale, the interaction will be highly correlated with the main effects, even if they are not correlated with each other. It's not the high correlation between predictors you have to worry about, it's the high correlation between the predictors and the interaction term.

I suggest just trying it to see for yourself--just make up two uncorrelated variables, multiply them, then see how correlated each one is with the the product. This is especially true if you're adding in squared terms.

But if you center one of them, so that half the values are negative, this doesn't happen. They may still be correlated, but much less so.

The other reason for centering is, as owenpediatrica says, is to help interpretability, especially when you have interaction terms. I'll see if I can explain this easily.

Let's say you have two continuous predictors, X1 and X2, plus their interaction.

Your equation is Y = B0 + B1X1 + B2X2 + B3X1*X2.

The effect of X1 on Y is (B1 + B3X2). How on earth do you interpret that? Well, when X2=0, the effect of X1 on Y is B1. When X2=1, the effect of X1 on Y is B1+B3. B3 is the difference in the slope for each 1 unit difference in X2. But what does B1 mean if X2 never equals 0? Nothing.

For example if you're doing a study on something like when babies begin talking and you have babies at 9 months, 12 months, 18 months, etc. If you center age at 12 months, Agecentered = 0 when Age = 12. B1 is now the effect of X1 when babies are 12 months old. That's meaningful. If you don't center age, you'll likely have values of negative numbers of words when Babies are 0 months old (since most babies don't talk at Age=0). It's just not meaningful.

It's not more right to center variables in this case, it just makes parameter estimates easier to interpret. You'll get the exact same results, p-values, model fit, etc.

But the multicollinearity issue between main effects and interactions can mess things up.

Make sense?
 
#5
thanks

I forgot to thank you for the reply you gave me regarding centering variables. I appreciate the time you took to explain the concepts. It was very helpful and has cleared up the loose ends in understanding the whys be the analyses in addition to how to conduct the analyses. I am often dumbfounded that stats are often taught with how to conduct with little attention given to why we do what we do in stats (well at least that is the case in my graduate M.S. in psych program). Anyhow, thanks again:tup::)