Regression MC

#1
Can someone confirm the answers to the below? And advise which ones are incorrect. Thanks!

1) True/False: Models selected by automated variable selection techniques do not need to be validated since they are ‘optimal’ models.

(2) Compute the Akaike Information Criterion (AIC) value for the linear regression model
Y = b0 + b1*X1 + b2*X2 + b3*X3.
The regression model was fitted on a sample of 250 observations and yielded a likelihood value of 0.18.



(a) 9.49

(b) 11.43

(c) 25.52

(d) 15.55


(3) Compute the Bayesian Information Criterion (BIC) value for the linear regression model
Y = b0 + b1*X1 + b2*X2 + b3*X3.
The regression model was fitted on a sample of 250 observations and yielded a likelihood value of 0.18.

(a) 9.49

(b) 11.43

(c) 25.52

(d) 15.55


(4) True/False: Consider a categorical predictor variable that has three levels denoted by 1, 2, and 3. We can include this categorical predictor variable in a regression model using this specification, where X1 is a dummy variable for level 1, X2 is a dummy variable for level 2, and X3 is a dummy variable for level 3.
Y = b0 + b1*X1 + b2*X2 + b3*X3

True

False


(5) True/False: The model Y = b0 + exp(b1*X1) + e can be transformed to a linear model.

True

False


(6) True/False: A variable transformation can be used as a remedial measure for heteroscedasticity.

True

False


(7) When comparing models of different sizes (i.e. a different number of predictor variables), we can use which metrics?

a. R-Squared and Adjusted R-Squared

b. R-Squared and Mallow’s Cp

c. AIC and R-Squared

d. AIC and BIC


(8) True/False: When using Mallow’s Cp for model selection, we should choose the model with the largest Cp value.

True

False


(9) True/False: Consider the case where the response variable Y is constrained to the interval [0,1]. In this case one can fit a linear regression model to Y without any transformation to Y.

True

False


(10) True/False: Consider the case where the response variable Y takes only two values: 0 and 1. A linear regression model can be fit to this data.

True

False




1) False
2) b
3) c
4) T
5) T
6) F
7) D
8) F
9) F
10) F
 

hlsmith

Omega Contributor
#2
Well I skipped the questions that required me to use a calculator and got the following from the top of my head, not looking anything up.


1. f
4. t or f, you can do this, but when dummy coding you typically use k-1 terms in the model, k=# categories. so I would say false, but you can do the other way, though the program may yell at you.
5. T
6. T
7. d
8. F
9. T, if the majority of values land around 0.5 with minimal dispersion, otherwise you would want to transform variable or use beta regression.
10. T or F, you can do it, but ideally you would use logistic regression, so I would prefer False.
 
#3
I don't understand on #10. For #10, this is what I think:

I think it is true because we will predict a value between -1 and 1; and then equate <=0 to 0 and >0 to 1. That is what even logistic regression would do.

So, with this, which one would you say #10 is?
 
#6
4. t or f, you can do this, but when dummy coding you typically use k-1 terms in the model, k=# categories. so I would say false, but you can do the other way, though the program may yell at you.
Four is false. If you fit the intercept with that model using k dummies for k groups you will have perfect collinearity and the matrix will be singular (not invertable). As you noted the program will probably give you an error message or will sometimes just drop one of the dummies to allow a fit. Search "dummy variable trap" for more background. You would need to fit the model without an intercept to use the k dummies for k groups.
 

Dason

Ambassador to the humans
#7
Four is false. If you fit the intercept with that model using k dummies for k groups you will have perfect collinearity and the matrix will be singular (not invertable). As you noted the program will probably give you an error message or will sometimes just drop one of the dummies to allow a fit. Search "dummy variable trap" for more background. You would need to fit the model without an intercept to use the k dummies for k groups.
While I don't disagree that the answer they are probably looking for is "false" I do disagree that that is the answer. We can easily fit that model but the solutions won't be unique and we won't be able to do any sort of inference on the parameter estimates directly. At that point we'd need to make sure that anything we want to talk about is a so called "estimable function".
 
#8
While I don't disagree that the answer they are probably looking for is "false" I do disagree that that is the answer. We can easily fit that model but the solutions won't be unique and we won't be able to do any sort of inference on the parameter estimates directly. At that point we'd need to make sure that anything we want to talk about is a so called "estimable function".
True there is the distinction that you won't get unique estimates. Is there any benefit to the estimation in that case? I'm not sure so I have to ask.