1. ## Area under the curve after 10-fold cross validation

Hi everyone,

After performing a 10-fold cross validation, I now have area under the curve (AUC) for each of the 10 test subsets. I calculated the average AUC (0.63).

I was interested to get the 95% CI, so I calculated the SE of the mean = 0.63 / sqrt root of 10 = 0.19.

The 95% CI for the AUC = 0.63 + / - 1.96 (0.19) = 0.23 to 1.02.

Is this approach correct ? , and how to interpret an AUC of > 1

Thanks

It would be bounded by 0 and 1 obviously, so you wouldn't report a value >/< than feasible. Bounded variables within 0 and 1 can fall into the beta distribution.

I am not sure given the process if the SE would be std / sqrt(n-value) , where n-value = 10, as in the case of say bootstrap the std is the SE, I would favor the former if guessing. I have seen the formula somewhere before online.

So 10% was used for model building and then applied to 90% to get AUC, for all 10 data partitions, correct?

If you have random samples from a population the average of the estimate is the expectation for the parameter based on Central Limit Theorem and you seem to do that for the first part of the problem.

Hi, thank you for taking the time to respond.
The 90% of the sample was used for training and the prediction model was tested in 10% of the sample X 10 times. Every sample appeared in the test set exactly once.

Could you please clarify " I am not sure given the process if the SE would be std / sqrt(n-value) , where n-value = 10, as in the case of say bootstrap the std is the SE, I would favor the former if guessing. I have seen the formula somewhere before online"

I calculated the SD of the mean = mean / sqrt of 10

You suggesting the standard deviation should be used instead ?

Many thanks

Many thanks, will go through this.

Just curious, what's the goal of getting 10 AUC? Model validation?

