cross-validated AUC

gianmarco

TS Contributor
#1
Hello All,
I hope the thread's title makes sense to you. I need to perform internal Cross-Validation using k-fold CV (needless to say, to assess how well a model behaves in relation to 'unknown' data).

What I am after is getting the distribution of AUC values across the different folds. So far, I did not found a viable option. I mean, there are some packages that perform different sorts of CV, but no one of them (at the best of my understanding) return what I want.

One that I found quite easy to use if the DAAG package, whose CVbinary() function performs k-folds CV and returns the cross-validation estimate of accuracy. The latter, as far as I understand, is the average of the accuracy across the k-folds (using 0.5 as cutoff point on probabilities).

What I would like to have is something similar, but with the averaged AUCs instead of the averaged accuracy values.

Long story short: do you know of any package that does something like that, or can you provide some help in writing down some piece of code to help me implementing what I am after from scratch?

Thank you for any guidance you will provide.

Best
Gm
 

bugman

Super Moderator
#2
Have you tried the function:

Code:
cv.glm()
in the boot package.

It does k fold cross validation and may have some of the arguments that you require.

Failing that, have you tried the package "cvAUC"? (I have not used that one).
 

gianmarco

TS Contributor
#4
Hello!
@Bugman:
thanks for pointing out cv.glm from boot package. I was wondering what is the interpretation of the returned delta values.
As for the cvAUC, I did not manage to have it work properly: I can't get the AUC for the various (say, 10) folds. I keep getting the AUC for just one fold :-(

@hlsmith
I did not get your question? Sorry
 

hlsmith

Omega Contributor
#5
You said averaged accuracy and averaged AUC, but those terms are usually interchangeable. I was confused by your statement based on that.
 

gianmarco

TS Contributor
#6
@hlsmith:
when I used 'accuracy' I was referring at the output of the DAAG package (command: CVbinary): it returns the accuracy which is the percentage of the correctly classified cases out of the total of the cases. This can be easily calculated from a confusion matrix. In this case, the accuracy depends on the cutoff threshold on probability. As far as I understand, AUC does not depends on a specific cutoff value and, indeed, in the dataset I was playing with, accuracy (50% cutoff point) was 85% while AUC was 0.917.
 

FR4K

New Member
#9
Ciao, non so se posso aiutarti, ma l'argomento mi interessa e sto cercando di capirci qualcosa di più anche io.

Se ho capito bene tu hai i tuoi 10 gruppi in cui hai spezzato il dataset e vorresti avere l'auc medio dei gruppi, che poi penso sia l'auc del modello crossvalidato, è giusto?
 

gianmarco

TS Contributor
#11
Yes Greta, that would be a possibility. The package cvAUC provides the option to calculate the auc in the context of k-fold CV (i.e., getting 10 auc and their average), but I di not manage to put it to work.