# Thread: Linear Regression - Beginner Need Help

1. ## Linear Regression - Beginner Need Help

Hello everyone, I have a simple exercise to do about linear regressions and models but I have a few doubts.

The exercise gives you 2 vectors(let's call them x and y).
The exercise asks to test whether y depend on x.
To do this I simply did:

model <- lm(y~x)

and then:
summary(model)

from my understanding; if the p-value in the summary is less than 0.05 then we can assume that there is a linear relationship between x and y. and the equation dscribing best fit is simply written using the coefficients that are found in the summary.
Is this correct?

The next exercise instead gives you 3 vectors of variables (x, z, w) and a vector y.
It asks which one between x, z and w is involved in determining y.

To solve this problem I repeated what I did for the first exercise, I created 3 models between x and y, z and y and w and y. I then looked at the p-values and if the p-value of a model was less than 0.05 i was assuming that that vector is involved in determining y.

From my exercise all 3 p-values are very low, therefore I think it implies that all 3 vectors are involved in determing y. is this correct?
Final part of the exercise it ask to give an equation to describe dependence of y from the vectors.
As all 3 vectors are involved in determining y i did:
model <- lm(y~ x+ z+ w)
and made the equations with the coefficients in the summary.

Is this correct or am i missing any major point?

So basically if you have 2 vectors and you want to know if 1 depend from the other you just do a lm model and check the p-value in the summary?

I hope I have bee clear enough.

Diesel

2. ## Re: Linear Regression - Beginner Need Help

Originally Posted by dieselonthecouch
Hello everyone, I have a simple exercise to do about linear regressions and models but I have a few doubts.

The exercise gives you 2 vectors(let's call them x and y).
The exercise asks to test whether y depend on x.
To do this I simply did:

model <- lm(y~x)

and then:
summary(model)

from my understanding; if the p-value in the summary is less than 0.05 then we can assume that there is a linear relationship between x and y. and the equation describing best fit is simply written using the coefficients that are found in the summary.
Is this correct?
You are correct in the essentials.

Originally Posted by dieselonthecouch
The next exercise instead gives you 3 vectors of variables (x, z, w) and a vector y.
It asks which one between x, z and w is involved in determining y.

To solve this problem I repeated what I did for the first exercise, I created 3 models between x and y, z and y and w and y. I then looked at the p-values and if the p-value of a model was less than 0.05 i was assuming that that vector is involved in determining y.

From my exercise all 3 p-values are very low, therefore I think it implies that all 3 vectors are involved in determing y. is this correct?
Final part of the exercise it ask to give an equation to describe dependence of y from the vectors.
As all 3 vectors are involved in determining y i did:
model <- lm(y~ x+ z+ w)
and made the equations with the coefficients in the summary.

Is this correct or am i missing any major point?

So basically if you have 2 vectors and you want to know if 1 depend from the other you just do a lm model and check the p-value in the summary?

I hope I have bee clear enough.

Diesel
This is a more complicated scenario because you also have the potential that some of the independent variables are correlated with each other. This is called multicollinearity. Create a correlation matrix between ALL of the variables and see whether any of the DVs are correlated with each other. You could also run a multiple regression and check the VIFs (Variance Inflation Factors). VIFs > 5, especially > 10 show a high degree of multicollinearity.

3. ## The Following User Says Thank You to Miner For This Useful Post:

dieselonthecouch (12-14-2016)

4. ## Re: Linear Regression - Beginner Need Help

Originally Posted by Miner
You are correct in the essentials.

This is a more complicated scenario because you also have the potential that some of the independent variables are correlated with each other. This is called multicollinearity. Create a correlation matrix between ALL of the variables and see whether any of the DVs are correlated with each other. You could also run a multiple regression and check the VIFs (Variance Inflation Factors). VIFs > 5, especially > 10 show a high degree of multicollinearity.
I think that we haven't covered multicollinearity and my exercise is meant to be easy, only on simple multiple linear regression. Sorry if it may seem that I ask the same question again, I am just trying to understand if what I did make sense.

But, coming back to my exercise, for the second part, I have another doubt:

So after I have created the 3 models (y~x), (y~z), (y~w) and my 3 p-values are less than 0.05 I came to the conclusion that y depends on x, z and w.

To verify this i did the multiple regression analysis:
lm(y~ x + z + w).

Does this approach make sense?
Once i check the summary of my multiple regression the intercept and the slope for x are significant (p-values less than 0.05). while slope for z and w are not significant (p-values are higher).
Does this mean that the slope for z and w are to be assumed 0 and therefore not included in the equation?(equation would then be: y= intercept + slopeofx(x))

My doubt is; why when i tested z and w alone with y respectively(lm(y~w) and lm(y~z) I had p-values very low? so basically when i was testing each vector individually i had a significant linear relationship between each vector and y.
So as all 3 vectors seemd to influence y I did the multiple regression test.
But as i said multiple regression lm(y~x + z+ w) gives me not significant p-values for z and w.
Is this normal?
I was expecting that all 3 slopes were going to be included in the multiple regression equation.

5. ## Re: Linear Regression - Beginner Need Help

Using multiple regression is better than individual regressions. Once the strongest relationship is accounted for x - y) there is little left to explain for z and w.

6. ## The Following User Says Thank You to Miner For This Useful Post:

dieselonthecouch (12-14-2016)

7. ## Re: Linear Regression - Beginner Need Help

Probably not need for the assignment, but you run the risk of bias and variance tradeoff. The more variables included that are at least marginally associated with Y the better the accuracy, but also the greater the variance in precision around estimates. So you have to settle between these two things. The primary decision is typically to use a simpler model, which may also be more generalizable to other samples.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts