Regression Analysis

I was working through the following set and needed some clarity on them. I attempted these questions (with my answers below) but need help to see which ones might be incorrect and if so, WHY


(1) Linear regression assumes:

a. The relationship between X and Y is a straight line.

b. The residuals are normally distributed.

c. The residuals are homoscedastic.

d. Both homoscedastic and normally distributed residuals.

(2) Often times, residual plots as well as other plots of the data will suggest some difficulties or abnormalities in the data. Which of the following statements are not considered difficulties?

a. A nonlinear relationship between X and Y is appropriate.

b. The variance of the error term (and of Y) is constant.

c. The error term does not have a normal distribution.

d. The selected model fits the data well except for very few discrepant or outlying data values, which may have greatly influenced the choice of the regression line.

(3) The Analysis of Variance (ANOVA) table in linear regression can be used to compute:

a. R-Squared

b. Adjusted R-Squared

c. The Overall F statistic

d. R-Squared, Adjusted R-Squared, and the Overall F statistic

(4) The hat matrix is given by:

a. X

b. X’X, where ‘ denotes the matrix transpose

c. inv(X’X)*X’, where we let inv() denote the matrix inverse

d. X*inv(X’X)*X’

(5) Consider a linear regression model with the predictor variables X1, X2, and X3. If we regress X1 on the other two predictor variables X2 and X3 and get an R-Squared value of 0.25, then the corresponding Variance Inflation Factor (VIF) for X1 is:

a. 0.25

b. 0.50

c. 0.66

d. 1.33

(6) Multicollinearity can be detected by:

a. the Overall F-test

b. a t-test

c. a variance inflation factor

d. a leverage value

(7) Diagnostics for assessing the Goodness-of-Fit for a linear regression model include:

a. Plotting Y-hat versus Y.

b. Plotting a Quantile-Quantile plot of the residuals.

c. Plotting Y against each continuous predictor variable.

d. d. Plots of Y-hat versus Y, a Quantile-Quantile plot of the residuals, and Y against each continuous predictor variable.

(8) Heteroscedasticity can be detected graphically by plotting the residuals against the in-sample predicted value Y-hat by visualizing these shapes:

a. a tube

b. a funnel

c. a double bow

d. a nonlinear pattern

e. a funnel, a double bow, or any nonlinear pattern

(9) The specification of a predictor effect can be validated using:

a. a histogram of the residuals

b. a scatterplot of the residuals against the predictor variable of interest

c. a scatterplot of the residuals against the predicted values Y-hat

d. a Quantile-Quantile plot of the residuals

(10) Models need to be validated:

a. In-sample

b. Out-of-sample

c. Both in-sample and out-of-sample

I got the following:

1) d
2) b
3) d
4) d
5) d
6) c
7) d
8) e
9) c
10) c


Less is more. Stay pure. Stay poor.
My answers:

1.) D. Linear regression is all about the residuals. IID, independent and identically distributed with normal distribution.
2.) C. You selection contrasts your answer to #1, constant mean homoscedastic.
3.) D. Yup, can figure them all out using degrees of freedom, etc.
4.) D. Yup, and why do they call it the hat matrix, it puts a hat on the y's, to distinguish they are estimates.
5.) D, Yup, easy enough formula to look up, though if you had more than 2 predictors I am not sure it would work.
6.) C, as well as Tolerance statistic
7.) D. And some people may also use the p-p plot
8.) E. Yup, the tube would represent constant errors, while in a funnel they either increase or decrease (not constant), double bow would be the model over then under then over then under predicts.
9.) B, would be my answer, so you would want to see how that particular variable fit data, C would distort this since it would be for all predictors at once, I am not 100% confident on B, but that is what I would put.
10.) this depends on the purpose of the model, but the general answer is that validation happens with new or hold out data. But given these are generic general questions, C could be a reasonable selection, as well as A or B if more details were provided.