LASSO regression for variable selection

#1
Hello!
I'm new to the community, so sorry for the mistakes I might make in writing my question. My objective is to run a multinomial logistic regression, since I have a 4-levels outcome variable. I have around 1200 observations and 40 potential categorical predictors, with number of levels ranging from 2 to 4. Even if n>>p, I would like to reduce the number of regressors to the most relevant ones. To this end, I thought that LASSO would be a good solution. My idea would be to use it just for variable selection (not prediction), so to run the LASSO regression on the entire dataset. Then, I would run a multinomial logistic regression without penalization using only the regressors whose coefficients were not shrunk to 0, in order to have interpretable coefficients with S.E.s. Do you believe this would be a feasible strategy? Would you advice LASSO for variable selection?
Thanks in advance!
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Using your content knowledge is better than LASSO since it does not under stand mediators, proxies, confounders, and if effects of outcomes are in the model. It helps withcollinearity, does now the source. Side noe, it is best practice not to get estimates from the same sample you build the model on. Getting final regimes on a hold outset better helps with finding 'regular' generalizable variable.

Welcome to the forum! PS, I have never used LASSO for multi nominal data, I imagine there are addition considerations, since you have one reference category vs other groups.