Area under the curve after 10-fold cross validation

#1
Hi everyone,

After performing a 10-fold cross validation, I now have area under the curve (AUC) for each of the 10 test subsets. I calculated the average AUC (0.63).

I was interested to get the 95% CI, so I calculated the SE of the mean = 0.63 / sqrt root of 10 = 0.19.

The 95% CI for the AUC = 0.63 + / - 1.96 (0.19) = 0.23 to 1.02.

Is this approach correct ? , and how to interpret an AUC of > 1

Thanks
 

hlsmith

Omega Contributor
#2
It would be bounded by 0 and 1 obviously, so you wouldn't report a value >/< than feasible. Bounded variables within 0 and 1 can fall into the beta distribution.


I am not sure given the process if the SE would be std / sqrt(n-value) , where n-value = 10, as in the case of say bootstrap the std is the SE, I would favor the former if guessing. I have seen the formula somewhere before online.


So 10% was used for model building and then applied to 90% to get AUC, for all 10 data partitions, correct?


If you have random samples from a population the average of the estimate is the expectation for the parameter based on Central Limit Theorem and you seem to do that for the first part of the problem.
 
#3
Hi, thank you for taking the time to respond.
The 90% of the sample was used for training and the prediction model was tested in 10% of the sample X 10 times. Every sample appeared in the test set exactly once.

Could you please clarify " I am not sure given the process if the SE would be std / sqrt(n-value) , where n-value = 10, as in the case of say bootstrap the std is the SE, I would favor the former if guessing. I have seen the formula somewhere before online"

I calculated the SD of the mean = mean / sqrt of 10

You suggesting the standard deviation should be used instead ?

Many thanks