# Thread: Why a simple linear regression (OLS)

1. ## Why a simple linear regression (OLS)

Hi!

This forum is so helpful I just can't seem to find an answer to my question though.

I am planning on doing a multiple regression (with five X and one Y). But before I do that, someone else recommended me to do a simple linear regression (OLS) first, with each of the independent and dependent variable separately.

But what is the use of this, other than determining the sign of the coefficient?

if I do a simple linear regression, then there is no point to a correlation matrix on the five independent variables X and one dependent variable Y?

But I could still check for correlations between the independent variables (so without Y)?

2. ## Re: Why a simple linear regression (OLS)

It usually all comes down to the purpose of the analysis. The simple reg approach is usually seen more in exploratory analysis. If you have a well designed protocol and hypotheses you a priori planned to conduct, the initial approach may bias your study's prior intentions.

Well, yeah a correlation matrix could be considered redundant, but may be easy to read.

Pros of doing the initial simple models is you can see the individual variable's relationship while not controlling for other variables. Then see how they change in the saturated model, this may help unmask collinearity are well as possibly a confounding relationship. It is typically considered a model building step, though it seems like you already have an idea of what you want your model to be.

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

Jazz3 (03-07-2017)

4. ## Re: Why a simple linear regression (OLS)

If you're going to use a bunch of SLRs to determine what to add in the MLR, you can use a higher alpha than you normally would since you're just screening potential variables (so you'd rather falsely include something than wrongly exclude something, it's generally considered better to do the former).

5. ## The Following User Says Thank You to ondansetron For This Useful Post:

Jazz3 (03-07-2017)

6. ## Re: Why a simple linear regression (OLS)

Originally Posted by hlsmith
It usually all comes down to the purpose of the analysis. The simple reg approach is usually seen more in exploratory analysis. If you have a well designed protocol and hypotheses you a priori planned to conduct, the initial approach may bias your study's prior intentions.

Well, yeah a correlation matrix could be considered redundant, but may be easy to read.

Pros of doing the initial simple models is you can see the individual variable's relationship while not controlling for other variables. Then see how they change in the saturated model, this may help unmask collinearity are well as possibly a confounding relationship. It is typically considered a model building step, though it seems like you already have an idea of what you want your model to be.

Ohh I see

Thank you

7. ## Re: Why a simple linear regression (OLS)

so doing a SLR prior to MLR is useful to determine what to add in the MLR, but not necessary if you already know it/set prior hypotheses.
But it could be useful to use pairwise correlation for your independent variables as a check up to avoid "problems" with the MLR?

8. ## Re: Why a simple linear regression (OLS)

Originally Posted by Jazz3
so doing a SLR prior to MLR is useful to determine what to add in the MLR, but not necessary if you already know it/set prior hypotheses.
But it could be useful to use pairwise correlation for your independent variables as a check up to avoid "problems" with the MLR?
I would use the Variance Inflation Factors in the multiple regression model to do this. It is more general then pairwise correlation coefficients. Frankly, doing separate regression models for each X,Y pair seems to me to be a waste of time and effort, that could be spent much better in investigating and refining the multiple regression model.

regards

9. ## The Following User Says Thank You to rogojel For This Useful Post:

Jazz3 (03-07-2017)

10. ## Re: Why a simple linear regression (OLS)

Originally Posted by Jazz3
so doing a SLR prior to MLR is useful to determine what to add in the MLR, but not necessary if you already know it/set prior hypotheses.
But it could be useful to use pairwise correlation for your independent variables as a check up to avoid "problems" with the MLR?
Good guidance is to include variables you know to be important without testing them. If theory or much prior research suggests something is important, it doesn't make a whole lot of sense to test it in a mechanical way. Doing individual SLRs can be used to determine what variables to include, at least initially, in MLR. You could also create a MLR with the variables known to be important, then create n models for the n predictors we are unsure about. In other words, if we know x1 and x2 are important, but are unsure about x3 and x4, we could create a model of Y that always includes x1 and x2, but then another model only adding x3 (and another only adding x4) to see what the marginal value is of those variables. Then, you would include the significant additions in a "complete" initial model, before continuing the model building process.

There are many ways to do it. I would look at "purposeful selection" outlined in Hosmer and Lemeshow's Applied Logistic Regression. Irrespective of the actual modeling type (logistic, OLS, cox regression, etc.) you can use the basis of their methodology for many situations. They noted that it's shown to perform at least as well and often better than other methods of model building and variable screening.

11. ## The Following User Says Thank You to ondansetron For This Useful Post:

Jazz3 (03-07-2017)

12. ## Re: Why a simple linear regression (OLS)

Originally Posted by rogojel
I would use the Variance Inflation Factors in the multiple regression model to do this. It is more general then pairwise correlation coefficients. Frankly, doing separate regression models for each X,Y pair seems to me to be a waste of time and effort, that could be spent much better in investigating and refining the multiple regression model.

regards
Hosmer and Lemeshow (and many others) would disagree in many cases ... but there are many ways to go about it, and of course, each approach really depends on the project.

13. ## The Following User Says Thank You to ondansetron For This Useful Post:

Jazz3 (03-07-2017)

14. ## Re: Why a simple linear regression (OLS)

This is interesting ..what would a separate model bring me that can not better tested in a multiple model? I never saw this recommendation before.

regards

15. ## Re: Why a simple linear regression (OLS)

Originally Posted by Jazz3
...".someone else recommended me to do a simple linear regression (OLS) first, with each of the independent and dependent variable separately. But what is the use of this, other than determining the sign of the coefficient..."
It is perhaps not well known, but any multiple regression model can be decomposed into a series of Simple Regression models.

16. ## The Following User Says Thank You to Dragan For This Useful Post:

rogojel (03-07-2017)

17. ## Re: Why a simple linear regression (OLS)

Originally Posted by rogojel
I would use the Variance Inflation Factors in the multiple regression model to do this. It is more general then pairwise correlation coefficients. Frankly, doing separate regression models for each X,Y pair seems to me to be a waste of time and effort, that could be spent much better in investigating and refining the multiple regression model.

regards
Yes I was actually planning on using the VIF, so doing pairwise correlation prior just had me like....

18. ## Re: Why a simple linear regression (OLS)

Originally Posted by ondansetron
Good guidance is to include variables you know to be important without testing them. If theory or much prior research suggests something is important, it doesn't make a whole lot of sense to test it in a mechanical way. Doing individual SLRs can be used to determine what variables to include, at least initially, in MLR. You could also create a MLR with the variables known to be important, then create n models for the n predictors we are unsure about. In other words, if we know x1 and x2 are important, but are unsure about x3 and x4, we could create a model of Y that always includes x1 and x2, but then another model only adding x3 (and another only adding x4) to see what the marginal value is of those variables. Then, you would include the significant additions in a "complete" initial model, before continuing the model building process.

There are many ways to do it. I would look at "purposeful selection" outlined in Hosmer and Lemeshow's Applied Logistic Regression. Irrespective of the actual modeling type (logistic, OLS, cox regression, etc.) you can use the basis of their methodology for many situations. They noted that it's shown to perform at least as well and often better than other methods of model building and variable screening.
Thank you

& Thanks for all the replies! I learned a lot

19. ## Re: Why a simple linear regression (OLS)

Originally Posted by ondansetron
Hosmer and Lemeshow (and many others) would disagree in many cases ... but there are many ways to go about it, and of course, each approach really depends on the project.
m

I just looked around google for this and it seems to be a different way of performing model selection. https://scfbm.biomedcentral.com/arti...1751-0473-3-17

As, such, it is still pretty unclear, why one would recommend someone to do the first step without recommending the whole model selection procedure. And why would one recommend a procedure, a priori, especially one that is not really verified in practice and is recommended for logistic regression?

regards

20. ## Re: Why a simple linear regression (OLS)

I have said this before that I believe model building processes based on SLR are a relic of pre-big computing. So if you had a slow machine or had to do all of the matrix algebra, etc. by had in MLR, while why not test variates 1 at a time firsts.

I think do SLR may also help you understand confounding and topics like Simpson's Paradox, but is not really needed so much this day and age.

21. ## The Following User Says Thank You to hlsmith For This Useful Post:

rogojel (03-07-2017)

22. ## Re: Why a simple linear regression (OLS)

Originally Posted by rogojel
m

I just looked around google for this and it seems to be a different way of performing model selection. https://scfbm.biomedcentral.com/arti...1751-0473-3-17

As, such, it is still pretty unclear, why one would recommend someone to do the first step without recommending the whole model selection procedure. And why would one recommend a procedure, a priori, especially one that is not really verified in practice and is recommended for logistic regression?

regards
Well, to be fair, I did mention that the OP should check out the entire section on it (I also noted a preliminary model before continuing in the building process), but I also used it to point out that it is part of a valid procedure (per a well regarded expert opinion). In the same chapter in H&L they discuss that the performance is often better than commonly used methods. I don't think the method is specific to logistic regression as they mention it in their book on survival analysis (Cox regression, for example), and they reference principles of model building in OLS (but this could just be to make connections for the reader). I would be surprised if the methodology was valid for those regressions but somehow invalid for OLS. Principles tend to remain the same: for example, you still find a way to use residuals to determine if model fit and specification are reasonable, you still use subset/nested tested to examine the joint significance of a group of variables, you still investigate multicollinearity, and so on. Their process of purposeful selection seems to me to be more of a general methodical approach rather than something specific to the type of regression (although some things, by necessity, might change in how you go about a certain step). And again, I did say it will depend on the project, and I offered some other suggestions as well. The important thing is having a logical process with rules, and I don't think it can hurt to compare models you arrive at based on the different available methods.

I will say that I'm also a fan of build-down and build-up approaches. I think that subject matter knowledge trumps and sort of inclusion/exclusion rules that are used for "exploratory" variables where we have no clue (as I said earlier, don't test something that theory dictates, just include it), and I know there's far more out there about this stuff than I know, so I'm open to whatever discussion people have about it.

23. ## The Following User Says Thank You to ondansetron For This Useful Post:

Jazz3 (03-08-2017)