Model selection for logistic regression, comparing p-value approach, BIC and CV.


I need to write a discussion on three approaches of model selection; p-value, Bayes factors (using BIC) and leave-one-out cross validation.

The p-value is not preferred, statiscal significance depends on sample size, p-value is based on imaginary data, depends on the intentions of the researcher. I have good literature about that, but by reading all that I am confused; what is the real interpretation of the p-value if we find a Pr(<Chi) .986 comparing model 1 (H0) and model 2(HA) and what's the correct definition of the hypothesis?

H0 : model 1 is adequaat (enough) ?
HA : model 2 is (more) adequaat ?

The bayes factor overcomes the problems with the p-value, but does it mean that nothing is wrong with this measurement?

I think I prefer leave-one-out cross validation over both methods due to the fact that it is much closer to the data! But what I don't get is what the real advantages are of CV comparing to BF? Does anybody have some good literature about it?



No cake for spunky
The p value is not based on "immaginary data" (whatever that is). It is based on how likely the results you got are given the distribution assumed with the null hypothesis. There are many p values in tests and most do not pertain to the model itself. You need to clarify which p value you are talking about in referencing it. The p value tells you whether to reject the null or not. It is rarely if ever used to compare between models which is not its purpose. An exception would be if you were doing something like a chi square difference test where the null is specifically that one model explains the data better than another.

I have never seen a statistical test with the hypothesis you mention.

BIC does not hav a p value at all. It is generally considered the best way to tell if two non-nested models are better but there is not, unlike a chi square difference test, a formal test if the difference in BIC between models is signficant.

I am not familiar with a cross validation test. Generally BIC is best for non-nested models and chi square difference test are best for nested models IMHO because it has a formal test if the differences between the models are signficant. But there are many ways to compare models including R squared, goodness of fit etc.