Post Estimation Shrinkage Regression Coefficients

ledzep

Point Mass at Zero
#1
Hey all,

I am reading about shrinkage estimators. I am sure most of you are familiar with the concept. In any case, I start with a brief introduction, summarising the text from R package "shrink".

When using model selection, usually when there are many variables present, some of the regression coefficients in final multivariable model can be inflated for several reasons. Post-estimation shrinkage is used to correct for the overestimation of regression coefficients caused by variable selection. Shrinkage can be either:
a) Global shrinkage: This modifies all regression coefficients by the same factor
b) Parameterwise shrinkage: This modifies different coefficients by different amount.

Here is an worked out Example, again from "shrink" vignette:

Code:
install.packages("shrink")
require(shrink)

## Simulate data with binary response(y) and two covariates (x1 and x2).
set.seed(888) # for replication
intercept <- 1
beta <- c(0.5, 1.2)
n <- 200
x1 <- rnorm(n, mean = 1, sd = 1)
x2 <- rbinom(n, size = 1, prob = 0.3)
linpred <- intercept + x1 * beta[1] + x2 * beta[2]
prob <- exp(linpred) / (1 + exp(linpred))
runis <- runif(n, min = 0, max = 1)
ytest <- ifelse(test = runis < prob, yes = 1, no = 0)
simdat <- data.frame(cbind(y = ifelse(runis < prob, 1, 0), x1, x2))


## Run logistic regression
fit <- glm(y ~ x1 + x2, family = binomial, data = simdat, x = TRUE)
summary(fit)

j1<-coef(fit) # store the coefficients


## Assess the shrinkage factors
j2<-shrink(fit, type = "global", method = "dfbeta")  # global
j3<-shrink(fit, type = "parameterwise", method = "dfbeta") # parameter wise
And here are the comparisons of model coefficients from different methods

Code:
k<-rbind(j1,j2[[3]],j3[[3]])
row.names(k)<-c("Regression Coefficients (Uncorrected)","Shrunken Coefficients (Global)","Shrunken Coefficients (Parameter Wise)")


## Comparing Coeffs
>k
                                       (Intercept)        x1       x2
Regression Coefficients (Uncorrected)    0.6411413 0.7774600 1.867610
Shrunken Coefficients (Global)           0.6934747 0.7150984 1.717806
Shrunken Coefficients (Parameter Wise)   0.6907555 0.7262782 1.661713
Here you can see from the results that parameter estimates are overestimation for x1 and x2, and the shrunken coefficients are slightly smaller.

I am fine with the theory but have few questions.

1) Using Shrinkage estimators, The coefficient for intercept gets inflated while that for x1 and x2, they get shrunken. Why?

2) As the text says,"Post-estimation shrinkage is used to correct for the overestimation of regression coefficients caused by variable selection". The simulated example has only 2 covariates. So, not much variable selection going on here. What is the shrinkage estimator doing here then? Can you use shrinkage estimators for any regression model (not only when performing variable selection)?

I have probably read about thousands of medical papers where they use variable selection methods to come up with a risk factor model. Yet, I haven't seen any of them examining the shrinkage factors. What is its practical applicability?

Many Thnx
Please Join the thread.