Regression Analysis

I'm trying to better understand linear regression by the following questions.

(1) Linear regression assumes:

a. The relationship between X and Y is a straight line.

b. The residuals are normally distributed.

c. The residuals are homoscedastic.

d. Both homoscedastic and normally distributed residuals.

(2) Often times, residual plots as well as other plots of the data will suggest some difficulties or abnormalities in the data. Which of the following statements are not considered difficulties?

a. A nonlinear relationship between X and Y is appropriate.

b. The variance of the error term (and of Y) is constant.

c. The error term does not have a normal distribution.

d. The selected model fits the data well except for very few discrepant or outlying data values, which may have greatly influenced the choice of the regression line.

(3) The Analysis of Variance (ANOVA) table in linear regression can be used to compute:

a. R-Squared

b. Adjusted R-Squared

c. The Overall F statistic

d. R-Squared, Adjusted R-Squared, and the Overall F statistic

(4) Consider a linear regression model with the predictor variables X1, X2, and X3. If we regress X1 on the other two predictor variables X2 and X3 and get an R-Squared value of 0.25, then the corresponding Variance Inflation Factor (VIF) for X1 is:

a. 0.25

b. 0.50

c. 0.66

d. 1.33

Can someone assist me on these few? I got for the first one, c

For the second one d, and fourth one d (1.33)



Less is more. Stay pure. Stay poor.
This smells of homework or exam, so I will be sparse in my reply, you got the last one correct and may want to re-evaluate the others. In particular, what is the difference between homoscedasticity and heteroscedasticity as well as which one is the one you want.
This are practice questions for an exam. For the first one, I do not understand why in linear regression, the residuals are NOT homoscedastic. I thought this was a consequence of Gauss Markov in relation to linear regression.

For the second one, I would say it has to do with the error as constant. usually we want E(error) = 0 and Var(error) = sigma^2 (a constant). So the answer is b?

For the third one, I know the ANOVA table gives us the F-statistic, so wouldn't the answer be c, then? I am not sure how the other ones fit in (they were calculated for the table, but not as outputs)
I don't understand why for the first one its D as opposed to C. Also, could you explain the other 3? I gave my explanations but need help. Thanks


TS Contributor
The assumptions should be about the errors of prediction, not the residuals. Recall that the residuals are sample estimates of the unobservable errors. Sloppy question.