- Thread starter wep
- Start date

So you are saying:

Simple linear regression with one variable (X1) versus multiple linear regression with X1 plus a group of insignificant covariates in the model?

The R^2 if not adjusted, will increase with additional variables in the model regardless of their significance. Adding more terms decreases the degrees of freedom in the calculations, so if you have the same term plus more terms, numbers inserted into the formula are not going to be the same in both scenarios.

P.S., You should run a model for fun to see this.

Thanks for your quick answer! You are of course correct, I'm sorry I did not formulate my question precise enough:

I am talking about:

- simple linear regression with one variable (X1)

versus

- multiple linear regression with X1 plus a group of other independent variables, but only X1 is kept in the model (backwards entry), because the others are not significant

Then the R^2 values should be identical, right?

I am asking because this was always correct in my formerly analyzed data sets and now one data set gives me different R^2 values...

No it typically shouldn't be the same. In general if you add another variable into the model (even an insignificant one) R^2 will either stay the same or increase. It almost always increases (at least slightly). So if you remove these variables the R^2 will decrease.

In case of the multiple linear regression with e.g. X1 + X2 + X3, the R^2 value increased at first, compared to the simple linear regression with X1. But due to the backwards method, X2 and X3 are removed from the model, until only significant predictors remained in the model, in my case only X1. Now the R^2 value should be the same as in the simple linear regression with X1, but that is not true. It is actually lower than in the simple linear regression and I don't understand it.

In case of the multiple linear regression with e.g. X1 + X2 + X3, the R^2 value increased at first, compared to the simple linear regression with X1. But due to the backwards method, X2 and X3 are removed from the model, until only significant predictors remained in the model, in my case only X1. Now the R^2 value should be the same as in the simple linear regression with X1, but that is not true. It is actually lower than in the simple linear regression and I don't understand it.

That is the solution!!! Thank you very much, I would never have found out about this myself!

Because X3 had missing values, these cases were excluded and this stayed like this in the model, even though X3 was removed afterwards.

If I remove the same cases in X1, I also get the lower value in simple regression, which resolves the whole mistery.

Thanks again, I was starting to loose it over this...

Hi,

I have fallen into a very similar trap before, and it took me ages to figure out the problem!

To avoid this trap, I now always look at the "case processing summary" that is generated in SPSS at the start (the bit that I used to just scroll through and ignore).

If you do this, you will avoid that pitfall!

Yes, and now I will be a bit more careful with the "missing listwise" option in multiple linear regression, especially when applying "backwards method", where variables can be removed from the model.

Missing data in any of the entered variables also "destroys" the valid data of other variables for the same case.

Instead "mean substitution" would circumvent this problem, but of course it has other critical aspects.

Thanks again to all of you, including the robots!