Hi,
I am looking at a data set of 45 continuous IV and 1 DV, trying to find an acceptable regression model. I tried the lasso with cross-validation and ended up with about 10 non-zero coefficients in the best model. However, when I ran the sanity test of building the OLS model with the 10 non-zero IVs half of them had very high p-values (about 0.8-0.7). I took those IVs out of the model and ended up with a quite reasonable final set.
My questions would be:
0. Does this make sense or is this approach completely stupid?
1. Are the lasso model and the OLS even comparable? I.e. is it reasonable to expect low p-values in the OLS regression for the IVs that have non-zero coefficients in the best lasso model?
1.a If not, how can I trust a model that has non-significant IVs in it?
2. Are there any arguments for not looking at the p-values at all and just going with the best lasso model?
2.a If my main interest is in finding parameters that might physically affect the outcome should I trust the lasso or the OLS (especially given the non-significant parameters in the lasso)?
Many thanks for any help, I am quite a novice in the lasso regression but find it very interesting.
regards
rogojel
I am looking at a data set of 45 continuous IV and 1 DV, trying to find an acceptable regression model. I tried the lasso with cross-validation and ended up with about 10 non-zero coefficients in the best model. However, when I ran the sanity test of building the OLS model with the 10 non-zero IVs half of them had very high p-values (about 0.8-0.7). I took those IVs out of the model and ended up with a quite reasonable final set.
My questions would be:
0. Does this make sense or is this approach completely stupid?
1. Are the lasso model and the OLS even comparable? I.e. is it reasonable to expect low p-values in the OLS regression for the IVs that have non-zero coefficients in the best lasso model?
1.a If not, how can I trust a model that has non-significant IVs in it?
2. Are there any arguments for not looking at the p-values at all and just going with the best lasso model?
2.a If my main interest is in finding parameters that might physically affect the outcome should I trust the lasso or the OLS (especially given the non-significant parameters in the lasso)?
Many thanks for any help, I am quite a novice in the lasso regression but find it very interesting.
regards
rogojel