I am looking at a data set of 45 continuous IV and 1 DV, trying to find an acceptable regression model. I tried the lasso with cross-validation and ended up with about 10 non-zero coefficients in the best model. However, when I ran the sanity test of building the OLS model with the 10 non-zero IVs half of them had very high p-values (about 0.8-0.7). I took those IVs out of the model and ended up with a quite reasonable final set.

My questions would be:

0. Does this make sense or is this approach completely stupid?

1. Are the lasso model and the OLS even comparable? I.e. is it reasonable to expect low p-values in the OLS regression for the IVs that have non-zero coefficients in the best lasso model?

1.a If not, how can I trust a model that has non-significant IVs in it?

2. Are there any arguments for not looking at the p-values at all and just going with the best lasso model?

2.a If my main interest is in finding parameters that might physically affect the outcome should I trust the lasso or the OLS (especially given the non-significant parameters in the lasso)?

Many thanks for any help, I am quite a novice in the lasso regression but find it very interesting.

regards

rogojel