Independent variables correlate to each other: violation of regression assumoptions?

Ok, so bear with me - total statistical idiot here, in way over his head.

I'm running multiple regressions of plant cover against precipitation, temperature, and year.

Both precipitation and temperature correlate with year.

Does this violate an assumption? Is this likely to invalidate my results? And if so, what do I do to get more valid results?

I'm using SAS, and am minimally familiar with the AUTOREG procedure. I assume that if there's something I need to fix, that's how to do it - yes?
Last edited:
I'm looking at absolute cover (the percentage of total land area, including unvegetated ground, occupied by a given species of plant or group of species) and relative cover (what percent of the total plant cover, not including unvegetated area, a species or group of species represents). I'm a grassland ecologist, so I'm dealing with 196 grassland species that occur on my site.
hi Charlie

though i am a bit late to reply you, but i think its not too late for you. here is your basic question:

" I'm running multiple regressions of plant cover against precipitation, temperature, and year.

Both precipitation and temperature correlate with year.

Does this violate an assumption? Is this likely to invalidate my results? And if so, what do I do to get more valid results?"

i am therefore assuming this:
your independent variables are: precipitation, temperature, year
your dependent variable is: plant cover
if this is correct, then
yes, it is violating one of the basic assumptions of multiple regression. (independent variables should not correlate with each other)

now, this assumption is not strict. in the sense, if you can argue that this correlation is not affecting your actual regression then well and good.
but i want to ask you a question: by year, do you mean just numbers ranging from 1996 to 2008? like 1996,1997,1998...,2008? if this is the case then you just remove this as the independent variable. because this is not the data; instead that is just a column similar to number 1,2,3 up to 12 years.
I'm interested, however, in groups of plants that decline or increase in cover in response to time, but not in response to either precipitation or temperature, and also vice versa, plants whose cover correlates to the climate variables but not to time.

Your summary of the setup is correct.
not quite sure whether i have got the last comment in this thread.
however, if you want to remove the inter-correlation between precipitation and year as well as between temperature and year, then create a new variable, which is the division of precipitation/year and another variable, which is temperature/year

this division is popular in economics. i am not sure whether you can apply this in ecology or not. but i am **** sure that this division will eliminate the problem of inter correlation.

again, i am assuming time, temperature and precipitation are independent variables.
Rather than constuct an overall model with all of the predictors, you evaluate a series of models with subsets of the predictors. By looking for significance you can get a feel for which predictors have a effect on the outcome.
I would like to add one point in this context.
In Multiple Linear Regression Equations we have, one dependent and a set of independnet variables. Here the indenpendence should not be taken in the statistical sense of zero correlation. If all the pairwise correlations are zeros, the partial Regression coefficients are nothing but simple regression coefficients ( of two variables considered Y and X ) and on the other hand if any one of the pairs of Correlations among X's is unit, the coefficients can not be estimated. These are two extreme cases. Some degree of corrleation always exists among independnet variables. If these correlations approach to Unity (i.e 1 ), it becomes a problem of multicollinearity and has to be dealt in a different manner. If the results of analyisis indicate reasonable Standard erros of Partial Regression Coefficients yielding acceptable R2, such models can be inintiallly accepted.

Good Luck

Dr N S Gandhi Prasad
Hyderabad India

Dr. Prasad is correct. It is not an assumption that independent variables are independent. They almost always are. It only becomes a problem if they are so correlated, the correlation is 1.

It is an assumption that the *responses* are independent (actually, the assumption is that the residuals are independent, but generally if you have one, you have the other).

Correlation among the independent variables DOES affect how you interpret the parameter estimates, however (the regression coefficients). Each one is interpreted as the partial effect--the effect of that predictor, AFTER accounting for the effects of the others.

And it is totally fine to use your year values (1998, 1999) instead of 1, 2, 3, etc. It will change your intercept, but it's fine. I assume that is what bourne was suggesting--it might make interpretation of the intercept easier and more meaningful, especially if you have any interactions in the model.

I don't know what PROC AUTOREG is, but I would use PROC GLM.

I hate to bring up another issue, but my concern, the way you've described it, is whether you have repeated measurements. Do you have a single area that you've measured over the years? Or are there multiple plots, each one measured multiple times? Can you describe the study design?