Hello!
I am faced again with Logistic Regression, and in particular binary LR, in a study that I am faced with trying to assess the impact of some topographic and environmental variables on a binary DV (optimal/non-optimal land quality).

I am trying to widen my knowledge and expertise of LR, and I am looking into cross-validation.

I have read some material (both on the web and scientific literature), and I have now a general (hopefully correct) idea of what it is and what it is aimed to. What I do not fully grasp is how the output of k-fold CV must be interpreted in relation to the model being cross-validated.

So, the question is:

1) the cv.glm() function of the 'boot' package returns a statistic (delta) which I would like to know more about (for a summary of the CV methods in R, I found this document). Have you some material or reference to point out to me, in order for me to understand what that statistic is actually saying about the model.


Extra (related) questions:

2) sample size and number of folds: is there a rule as to the number of folds to use in relation to size the whole dataset?

3) Am I correct in understanding that the difference between accuracy (e.g., ROC curve) and validation (e.g., k-fold CV) can be summarized as follows (in layman terms):
accuracy: assessing the discriminatory ability of the model in relation to the data you have fed into it
validation: assessing how well the model behaves in relation to (future) new data


Sorry for the quite long post. Thanks for any hint you will kindly provide.

Cheers
Gm