1. ## Repeated k-fold Cross-Validation

I saw this:

It depends on the data, but it's common to find examples cases 10 fold CV, plus repetition: 10 fold CV, repeated 5 times. Other: 5 fold CV, repeated 3 times."

I guess, they run the k-fold CV and then change the seed and run it again. I could see doing this during the modeling building process to ensure you are getting a variety of sets. Has anyone done the repeating part before. Is this done regularly in practice?

2. ## Re: Repeated k-fold Cross-Validation

I've never heard of doing that before. I can see it useful in researching a new approach maybe, but the whole point of k-folds is to average anyway. So repeating it shouldn't diverge the average significantly. On that reasoning, why would you 5 fold 3x? Less folds would seem to be more variable and need more repetition!

3. ## Re: Repeated k-fold Cross-Validation

No, I agree and was caught by their use of "Common". I believe they were maybe thinking of using it to then graph all of these to see that there wasn't too much wriggling going on between repetitions. At some point is just seems to be getting closer to LOOCV. But perhaps this is a better approach in the face of sparse data.

4. ## Re: Repeated k-fold Cross-Validation

I happened to use a variant of this approach: repeating (say, 1000 times) a sort of 2-fold data splitting (1 training, 1 testing) in order to perform an internal validation of the model. This was described in literature, and I implemented it in R:
http://cainarchaeology.weebly.com/r-...alidation.html

5. ## Re: Repeated k-fold Cross-Validation

I randomly came across the process here as well, where they do a split then run repeated CV on the test set. So they split and then use repeated fold CV.

https://www.r-bloggers.com/handling-...roduction/amp/

6. ## Re: Repeated k-fold Cross-Validation

I am in the process of learning about logistic regularization and reading about lasso. I have come across the use of k-fold repeated CV a few times in the selection of an optimal (minima) error value. So it has been mention that k-fold based values can vary (guessing by seed), so when trying to find the lowest error value plotted against penalization value the k-fold repeated CV can be used.

7. ## Re: Repeated k-fold Cross-Validation

My understanding is that repeated CV is pretty common. I've seen multiple texts mention it and I know it's built into `caret` package in R. The theory IIRC is that it's supposed to yield a slightly lower variance estimate of the test error.

8. ## Re: Repeated k-fold Cross-Validation

Originally Posted by hlsmith
I randomly came across the process here as well, where they do a split then run repeated CV on the test set. So they split and then use repeated fold CV.

https://www.r-bloggers.com/handling-...roduction/amp/
If I understand what they're doing, the "repeated" k-fold cross validation is for the hyperparameter tuning. That's pretty standard. The k-fold takes care of identifying out-of-sample predictive success and the repeated part handles the hyperparameter tuning. The original post it sounded more like repeating on the same data/model and that would just be odd. Repeating for searching over a set of hyperparameter values makes sense.

9. ## Re: Repeated k-fold Cross-Validation

Originally Posted by bryangoodrich
The original post it sounded more like repeating on the same data/model and that would just be odd.
That is what repeated CV is. You do, say, 10-fold CV multiple times with the same data and same model. The only thing that differs is that you randomly divide the data into 10 folds a different way each time. This is an attempt to reduce the (small amount of) variance arising from the fact that a particular set of 10-fold CV results are based on just 1 out of a very large number of ways that one could have divided the data into 10 folds. What they are saying in the blog post is that they evaluate every proposed hyperparameter value using repeated 10-fold CV. So for example if they are searching the best value of a hyperparameter alpha, they do 10-fold CV 10 times at alpha = .1, 10 more times at alpha = .2, and so on. That's what that call to trainControl() in the caret package does.

10. ## Re: Repeated k-fold Cross-Validation

Right, but the OP said "it's common to find examples cases 10 fold CV, plus repetition: 10 fold CV, repeated 5 times. Other: 5 fold CV, repeated 3 times." This repeated n times thing comes out as if you repeat your k-fold CV for the sake of repetition. That's odd. If you're not parameter tuning, then you're just testing your model using k-fold CV to capture out-of-sample prediction estimates. If you're searching a parameter space, then clearly every parameter is being separately tested (repetition for the sake of hyperparameters). That's a missing context the quote left me going "huh?"

11. ## Re: Repeated k-fold Cross-Validation

Originally Posted by bryangoodrich
This repeated n times thing comes out as if you repeat your k-fold CV for the sake of repetition. That's odd.
YOU DO. You keep asserting that it's odd, but I literally just explained why it's not. It reduces the variance of the estimate of the test error (not by a lot, but by a little). It is a common practice, whether you're tuning hyper-parameters or not.

12. ## Re: Repeated k-fold Cross-Validation

Fair enough. My experience with repeated k-fold CV is parameter tuning, but why it is odd is because it is without context, even if you want to say it reduces variance. The quote says, "10 fold CV, repeated 5 times ... 5 fold CV, repeated 3 times." This doesn't even have context to reducing variance, which is not something I focus on because I don't use resampling methods or worry about unstable averages from k-fold CV estimates. If I were, blanket statements like "repeat x times for 10-fold CV" are nonsense. The amount of repeated iterations is highly dependent on the problem setup.

13. ## Re: Repeated k-fold Cross-Validation

I also wonder, if the repeated component would make the study specific results more reproducible given someone else did not know the seed number!

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts