"Suppose that in a MULTIPLE linear regression analysis, it is of interest to compare a model with 3 independent variables to a model with the same response varaible and these same 3 independent variables plus 2 additional independent variables.
As more predictors are added to the model, the coefficient of multiple determination (R^2) will increase, so the model with 5 predicator variables will have a higher R^2.
The partial F-test for the coefficients of the 2 additional predictor variables (H_o: β_4=β_5=0) is equivalent to testing that the increase in R^2 is statistically signifcant."
I don't understand the bolded sentence. Why are they equivalent?
Thanks for explaining!
[also under discussion in Math Help forum]
Why?
According to my notes:
F = (extra SS/extra df) / MSE_full
where extra SS = SSE_reduced - SSE_full
The statement claims that the test of H_o: β_4 = β_5 = 0 is equivalent to testing that the increase in R^2 is statistically signifcant. What would be the equivalent null and alternative hypotheses in terms of R^2?
Thanks!
But how can we see that the test H_o: β_4=β_5=0 is equivalent to testing that the increase in R^2 is statistically signifcant?
Thanks!
Here's a general way to consider this.
If we have k independent variables we can state the following null hypothesis:
Ho: B2 = B3 = ... = Bk = 0
It follows that (assuming that the error terms are normally distributed):
F = ESS/(k-1) / RSS/(N-k) follows the F distribution with k-1 and N-k df.
Note that the total number of parameters to be estimated is k, of which one is the intercept term.
Note also that ESS is the "Explained Sums of Squares" and RSS is the "Residiual Sums of Squares" and TSS is the "Total Sums of Squares".
Now watch what I do:
F = [(N-k)/(k-1)] * (ESS / RSS)
= [(N-k)/(k-1)] * [ESS / (TSS - ESS)]
= [(N-k)/(k-1)] * [(ESS / TSS) / (1-(ESS/TSS))]
= [(N-k)/(k-1)] * [R^2 / (1 - R^2)]
= R^2 / ( k -1) / (1-R^2) / (N - k).
So, now we can see how F and R^2 are related.
This also works not only with a full model (above) but also with reduced models.
F = (RSS_reduced - RSS_full) / # of additional variables / (RSS_full / (N - k)
which is equal to:
F = (R^2_full - R^2_reduced) / # of additional variables / (1-R^2_full)/(N-k)
Mkay.
Tweet |