Help with Multiple Regression

#1
Hi all, im struggling with what to do with this multiple regression question, any help tips or pointers would be greatly appreciated!




Am i right in thinking for part 1, sign is just +ve pr -ve, magnitude is just size, but im nto sure on significance,

Need some pointers and help as to where to start and how i can go through it.
 
Last edited:

Dragan

Super Moderator
#2
I'll give you some hints in terms of parts (b) and (c). In terms of part (a) I think you can get this, however, I don't see any p-values (?)...only standard errors. So you're going to have to compute the t-statistics for each regression weight...i.e.

ti = bi / Sbi, for i=1.2,3. Compare each t-statistic to the critical t-value on 47 degrees of freedom (51-3-1) get this value out textbook (or use a software package).

If you had 3 dummy vectors then you fall into to the so-called dummy variable trap. That is, you would have perfect collinearity (this is bad). Look up what this means.

In terms of part (c), think of your first regression model as the "full" model and the second model as the restricted (or reduced) model.

In short, your restricted model is testing the hypothesis that:

H[0]: Beta2=Beta3=0

Your sampling distribtion under the H[0] is F distributed with 2 and 51 - 3 -1 degrees of freedom. Your statistic is computed as;

F = [(R^2full - R^2restricted) / ( 3 - 1)] / [( 1 -R^2full) / (N - 3 - 1)]

Compare your computed F to the critical F (2, 47) (say, alpha = 0.05) to make the decision to reject or fail to reject
H[0].
 
Last edited:
#3
This is a really good question.

What makes something statistically significant? Usually (but not alwasy) we have an assumption that the effect (coefficient is zero), but we always observe departures from zero in our estimates even when it is truely zero(variability).

So we ask ourselves how do we test that something is zero? We test by finding an observable value that has a known distribution. If the p-value is under some significance level like alpha = .05 we say "these are statistically significant effects with a p-value of...." And if we do not and arn't satisfied that this is extreme enough to rule out the possibility that the unobserved quantity is zero we say they are not statistically significant. And typically my language is "this is insufficient evidence..."


So for these coefficients they are tested with the assumption that they are zero as t-statistics. They have given you a point estimate, a standard error, and N.

Your job is to track down the t statistics in your book in pencil in what you know.


Now there is also 'practical significance' which depending on your teacher may deserve a comment. Usually we eye the R-Squared or adjusted R-Squared where available to address that. It is possible for something to be statistically significant while not being practically significant (meaning there is so little change in explanatory ability that you may not care to bother).

For example, there is an F-Test for nest models associated with the part C question, and you might do the leg work and discover that the extra variables were statistically significant. But then we look over at the R-squared which only goes up 1/100th and have to ask ourselves: what's the point?

Good luck. The heart of this question is to track down formulas in the book where you calculate these things and do them by hand with the aggregate values.
 
#4
I've been trying to look at the t statistic, but the formulae is confusing me, and im not sure what values go where, some explanation of that would be great.

But for the dummy variables bit, i guesswhat your saying is that, if you have more than 2 dummy variables, you get multicollinearty , where the indepenent variables become strongly correlated and so coefficient estimates for alpha and beta values will chance very strongly with only small changes in the equation?

As for C, i think i understand that Beta2=beta3=0
are you there saying those 2 dummy variables that are removed, = 0 for the null H[0] hypothesis?

But after that its using the T statistic im still a bit stuck on.
 

Dragan

Super Moderator
#5
I've been trying to look at the t statistic, but the formulae is confusing me, and im not sure what values go where, some explanation of that would be great.

But for the dummy variables bit, i guesswhat your saying is that, if you have more than 2 dummy variables, you get multicollinearty , where the indepenent variables become strongly correlated and so coefficient estimates for alpha and beta values will chance very strongly with only small changes in the equation?

As for C, i think i understand that Beta2=beta3=0
are you there saying those 2 dummy variables that are removed, = 0 for the null H[0] hypothesis?

But after that its using the T statistic im still a bit stuck on.
Let me provide an example and you will see how to do this.

Let's take b1=3.2889 and sb1=0.32,

t= b1/sb1 = 10.2778.

Next compare t with the critical value of tcrit = +-2.01174 (alpha = 0.05). I got tcrit using Minitab (tcrit is associated with 51 - 3 -1 = 47df.)

I think you can do the rest - tcrit is the same. Thus, we reject H[0]: Beta1 = 0, because 10.2278 is greater than |2.01174|.


(2) You will have perfect collinearity if you have three regions and you use three dummy coded vectors. Your regression model will "blow-up" and won't run at all. Rule of thumb, the number of dummy vectors to be used is the number of groups (or regions for this example) minus 1.


(3) For the last part you have to use an F ratio (not t statistic) because you have 2 degrees of freedom in the numerator as I showed above. Just apply the formula for F as I provided above.

A side note: t^2 = F but ONLY when you have 1 degree of freedom in the numerator for the F ratio.
 
#6
Ah right, for A then, i see now how you compute the t statistic, thanks.

Where you using that confidence interval of 95% , we have n = 51, we just use table, so we can use df of 50 - which = 2.009 , i assume this is why its a bit smaller than the actual value you had a programme work out, am i correct here?

And then because using the t statistic, does it mean, because the answer (10.2278) is bigger than the critical value of 2.009 (you have 2.0147) then the beta value is insignificant? as we rejected the null hypothesis?

as for 2) we've discussed that and i get it now, thanks :)

for 3) if the formula is F = [(R^2full - R^2restricted) / ( 3 - 1)] / [( 1 -R^2full) / (N - 3 - 1)]

would that make it

[(0.73-0.72)] / (3-1) / [(1-0.73)/ 51-3-1)] ?

im still a bit unsure as to what this 3-1 thing is, but thank you again so far.
 

Dragan

Super Moderator
#7
(1) am i correct here?

(2) then the beta value is insignificant? as we rejected the null hypothesis?

(3) [ (0.73 - 0.72) / (3-1) ] / [ (1 - 0.73)/ (51-3-1) ] ?

(4) im still a bit unsure as to what this 3-1 thing is, but thank you again so far.

As for (1), yes, you are essentially correct. Note that this has nothing to do with a confidence interval. I (ala Minitab) computed that critical value based on 47 degrees of freedom -that's it.

As for (2), you would say the the beta value is significant...bla.

As for (3), yes the equation is correct - look at my bold parens.

As for (4), 3 - 1 comes from the fact that you have 3 independent variables (I.V.'s) in the full model and only 1 independent variable (I.V.) in the restricted model.

The general equation is as follows:

F = [ (R^2full - R^2restricted) / ( # of I.V.'s full - # of I.V.'s restricted) ] / [( 1 - R^2full) / (N - # of I.V.'s full - 1)]
 
Last edited: