LASSO (binomial) p-value and confidence intervals (selectiveInference) error

hlsmith

Omega Contributor
#1
I was just trying to quickly check out the "selectiveInference" package to slap confidence intervals on my beta coefficients from a glmnet (LASSO: binomial) model. The package is suppose to provide more robust CIs that account for initially using LASSO to get subset of features. So it is conditional on that process. I get the following error, I am going to try now to use their example data (simulation) to see if that runs for me.


Thanks!


Error in if (any(abs(g) > 1 + tol.kkt)) warning(paste("Solution beta does not satisfy the KKT conditions", : missing value where TRUE/FALSE needed

P.S. My initial glmnet LASSO model runs fine without issues, I then use the defined lambda from my CV model building process and the lambda's associated coefficients in the selectiveInference package for my attempted CI ascertainment. That is when I get the error.
 

hlsmith

Omega Contributor
#2
Well last night I could not get the toy example to run (based on a simulation). It seemed like a simple matrix algebra component in constructing the toy set was erring in my r session. I will upload that code later.

Here is a link to the program code (selectiveInference) where you can partially see the rule associated with my error:

https://rdrr.io/cran/selectiveInference/src/R/funs.fixed.R

Thanks
 

hlsmith

Omega Contributor
#3
Well the code example with the sim-toy set didn't run, because I was fiddling with it, trying to rename vars, since I had comparable named vars in another opened program. And it wasn't running because I messed it up.


For anyone interested, below is the example that runs fine and is what I am trying to replicate with my own real-life data, ...still.


Code:
#logistic model
set.seed(43)
n = 50 
p  =  10 
sigma  = 1
x = matrix(rnorm(n*p),n,p)
x=scale(x,TRUE,TRUE)

beta = c(3,2,rep(0,p-2))
y = x%*%beta  + sigma*rnorm(n)
y=1*(y>mean(y))



# first run glmnet
gfit = glmnet(x,y,standardize=FALSE,family="binomial")

 # extract coef  for a given  lambda; note the  1/n  factor! 
 # (and here   we  DO    include  the  intercept  term)
lambda = .8
beta_hat  = coef(gfit, s=lambda/n, exact=TRUE)

 # compute  fixed lambda p-values  and selection intervals
out  = fixedLassoInf(x,y,beta_hat,lambda,family="binomial")
out
Post variable selection type I error, is what this is used for. Thanks!
 

hlsmith

Omega Contributor
#4
I still wonder if some of my issue may be related to missing data in the original set? Planning to hit this hard in the morning. Seems like I should be able to figure some of the following by hand.


tol.kkt =Tolerance for determining if an entry of the subgradient is zero


# Check the KKT conditions
g = t(x)%*%(y-x%*%beta) / lambda
if (any(abs(g) > 1+tol.kkt * sqrt(sum(y^2))))
warning(paste("Solution beta does not satisfy the KKT conditions",
"(to within specified tolerances)"))

vars = which(abs(beta) > tol.beta / sqrt(colSums(x^2)))
if(length(vars)==0){
cat("Empty model",fill=T)
return()
}
if (any(sign(g[vars]) != sign(beta[vars])))
warning(paste("Solution beta does not satisfy the KKT conditions",
"(to within specified tolerances). You might try rerunning",
"glmnet with a lower setting of the",
"'thresh' parameter, for a more accurate convergence."))
 

hlsmith

Omega Contributor
#5
Update for those who may be interested, I can get the code for the selectiveInference package to run. Much of my problems are likely related to my wonky dataset. I removed observations with missingness, which I had previously deemed MCAR. I also surmised per an error message that two independent variables actually had the exact same values and considered redundant, so I dropped one of them. They addressed a comparable construct.


The code runs and generates output, though due to the sparseness in data some confidence intervals are large and some go to infinity for some variables. I will provide more updates if I glean any other insights.
 

hlsmith

Omega Contributor
#6
I just ran a LASSO model building process with lambda based on CV for a new project (not so wonky data this time) followed by the use of selectiveInference without any issues.

It seems that I have the code all figured out. Now I need to better understand the intricacies of how the conditional procedure functions.