Regression Analysis Practice

#1
I'm trying to solve these questions from a self-study for actuarial sci. I would appreciate if someone can verify and help explain the following. I included my attempt below as well. Thanks!

(1) True/False: Models selected by automated variable selection techniques do not need to be validated since they are ‘optimal’ models.

(2) Compute the Akaike Information Criterion (AIC) value for the linear regression model
Y = b0 + b1*X1 + b2*X2 + b3*X3.
The regression model was fitted on a sample of 250 observations and yielded a likelihood value of 0.18.



(a) 9.49

(b) 11.43

(c) 25.52

(d) 15.55


(3) Compute the Bayesian Information Criterion (BIC) value for the linear regression model
Y = b0 + b1*X1 + b2*X2 + b3*X3.
The regression model was fitted on a sample of 250 observations and yielded a likelihood value of 0.18.

(a) 9.49

(b) 11.43

(c) 25.52

(d) 15.55


(4) True/False: Consider a categorical predictor variable that has three levels denoted by 1, 2, and 3. We can include this categorical predictor variable in a regression model using this specification, where X1 is a dummy variable for level 1, X2 is a dummy variable for level 2, and X3 is a dummy variable for level 3.
Y = b0 + b1*X1 + b2*X2 + b3*X3

True

False


(5) True/False: The model Y = b0 + exp(b1*X1) + e can be transformed to a linear model.

True

False


(6) True/False: A variable transformation can be used as a remedial measure for heteroscedasticity.

True

False


(7) When comparing models of different sizes (i.e. a different number of predictor variables), we can use which metrics?

a. R-Squared and Adjusted R-Squared

b. R-Squared and Mallow’s Cp

c. AIC and R-Squared

d. AIC and BIC


(8) True/False: When using Mallow’s Cp for model selection, we should choose the model with the largest Cp value.

True

False


(9) True/False: Consider the case where the response variable Y is constrained to the interval [0,1]. In this case one can fit a linear regression model to Y without any transformation to Y.

True

False


(10) True/False: Consider the case where the response variable Y takes only two values: 0 and 1. A linear regression model can be fit to this data.

True

False





Answers to What I tried:


1) False
2) I tried the formula nlog(RSS/N) + 2k, not working
3) I tried -2ln(likelihood) + ln(N)*K, not working
4) TRUE
5) True
6) False
7) D
8) False - we need smallest
9) False - not sure why
10) FALSE - Logistic, not linear
 

rogojel

TS Contributor
#2
hi,
some questions are not formulated in the gest way imo:

e.g 4

True, though dummy variables would be prederred - so the expected answer is probabl False

5. which transformation wozld you use?

6. Why false?

9. Econometricians use such models, so technically the answer is True. The expected answer is probably False, for some X Y will surely get out of the +/- 1 range
 
#3
hi,
some questions are not formulated in the gest way imo:

e.g 4

True, though dummy variables would be prederred - so the expected answer is probabl False
Answer for 4 would be false given that there would be perfect collinearity in that model due to an intercept being fit along with the 3 dummies for 3 groups. If an intercept is fit, you want to use k-1 dummies for a k level categorical variable. The OP should look up "dummy variable trap" if they want more on this.

Also agree with you on 9 that the answer is true since econometricians use it, especially with heteroscedasticity robust SEs, and when they limit there inferences to avoid extrapolation. Whether or not it is as optimal as a logistic regression is a different story.