+ Reply to Thread
Results 1 to 13 of 13

Thread: Repeated k-fold Cross-Validation

  1. #1
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Repeated k-fold Cross-Validation




    I saw this:


    "Any advices on practice?

    It depends on the data, but it's common to find examples cases 10 fold CV, plus repetition: 10 fold CV, repeated 5 times. Other: 5 fold CV, repeated 3 times."


    I guess, they run the k-fold CV and then change the seed and run it again. I could see doing this during the modeling building process to ensure you are getting a variety of sets. Has anyone done the repeating part before. Is this done regularly in practice?
    Stop cowardice, ban guns!

  2. #2
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Repeated k-fold Cross-Validation

    I've never heard of doing that before. I can see it useful in researching a new approach maybe, but the whole point of k-folds is to average anyway. So repeating it shouldn't diverge the average significantly. On that reasoning, why would you 5 fold 3x? Less folds would seem to be more variable and need more repetition!
    You should definitely use jQuery. It's really great and does all things.

  3. #3
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Repeated k-fold Cross-Validation

    No, I agree and was caught by their use of "Common". I believe they were maybe thinking of using it to then graph all of these to see that there wasn't too much wriggling going on between repetitions. At some point is just seems to be getting closer to LOOCV. But perhaps this is a better approach in the face of sparse data.
    Stop cowardice, ban guns!

  4. #4
    TS Contributor
    Points: 40,621, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,368
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Repeated k-fold Cross-Validation

    I happened to use a variant of this approach: repeating (say, 1000 times) a sort of 2-fold data splitting (1 training, 1 testing) in order to perform an internal validation of the model. This was described in literature, and I implemented it in R:
    http://cainarchaeology.weebly.com/r-...alidation.html
    Last edited by gianmarco; 12-11-2016 at 12:37 PM. Reason: Link added
    http://cainarchaeology.weebly.com/

  5. #5
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Repeated k-fold Cross-Validation

    I randomly came across the process here as well, where they do a split then run repeated CV on the test set. So they split and then use repeated fold CV.

    https://www.r-bloggers.com/handling-...roduction/amp/
    Stop cowardice, ban guns!

  6. #6
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Repeated k-fold Cross-Validation

    I am in the process of learning about logistic regularization and reading about lasso. I have come across the use of k-fold repeated CV a few times in the selection of an optimal (minima) error value. So it has been mention that k-fold based values can vary (guessing by seed), so when trying to find the lowest error value plotted against penalization value the k-fold repeated CV can be used.
    Stop cowardice, ban guns!

  7. #7
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Repeated k-fold Cross-Validation

    My understanding is that repeated CV is pretty common. I've seen multiple texts mention it and I know it's built into `caret` package in R. The theory IIRC is that it's supposed to yield a slightly lower variance estimate of the test error.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  8. #8
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Repeated k-fold Cross-Validation

    Quote Originally Posted by hlsmith View Post
    I randomly came across the process here as well, where they do a split then run repeated CV on the test set. So they split and then use repeated fold CV.

    https://www.r-bloggers.com/handling-...roduction/amp/
    If I understand what they're doing, the "repeated" k-fold cross validation is for the hyperparameter tuning. That's pretty standard. The k-fold takes care of identifying out-of-sample predictive success and the repeated part handles the hyperparameter tuning. The original post it sounded more like repeating on the same data/model and that would just be odd. Repeating for searching over a set of hyperparameter values makes sense.
    You should definitely use jQuery. It's really great and does all things.

  9. #9
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Repeated k-fold Cross-Validation

    Quote Originally Posted by bryangoodrich View Post
    The original post it sounded more like repeating on the same data/model and that would just be odd.
    That is what repeated CV is. You do, say, 10-fold CV multiple times with the same data and same model. The only thing that differs is that you randomly divide the data into 10 folds a different way each time. This is an attempt to reduce the (small amount of) variance arising from the fact that a particular set of 10-fold CV results are based on just 1 out of a very large number of ways that one could have divided the data into 10 folds. What they are saying in the blog post is that they evaluate every proposed hyperparameter value using repeated 10-fold CV. So for example if they are searching the best value of a hyperparameter alpha, they do 10-fold CV 10 times at alpha = .1, 10 more times at alpha = .2, and so on. That's what that call to trainControl() in the caret package does.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  10. #10
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Repeated k-fold Cross-Validation

    Right, but the OP said "it's common to find examples cases 10 fold CV, plus repetition: 10 fold CV, repeated 5 times. Other: 5 fold CV, repeated 3 times." This repeated n times thing comes out as if you repeat your k-fold CV for the sake of repetition. That's odd. If you're not parameter tuning, then you're just testing your model using k-fold CV to capture out-of-sample prediction estimates. If you're searching a parameter space, then clearly every parameter is being separately tested (repetition for the sake of hyperparameters). That's a missing context the quote left me going "huh?"
    You should definitely use jQuery. It's really great and does all things.

  11. #11
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Repeated k-fold Cross-Validation

    Quote Originally Posted by bryangoodrich View Post
    This repeated n times thing comes out as if you repeat your k-fold CV for the sake of repetition. That's odd.
    YOU DO. You keep asserting that it's odd, but I literally just explained why it's not. It reduces the variance of the estimate of the test error (not by a lot, but by a little). It is a common practice, whether you're tuning hyper-parameters or not.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  12. #12
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Repeated k-fold Cross-Validation

    Fair enough. My experience with repeated k-fold CV is parameter tuning, but why it is odd is because it is without context, even if you want to say it reduces variance. The quote says, "10 fold CV, repeated 5 times ... 5 fold CV, repeated 3 times." This doesn't even have context to reducing variance, which is not something I focus on because I don't use resampling methods or worry about unstable averages from k-fold CV estimates. If I were, blanket statements like "repeat x times for 10-fold CV" are nonsense. The amount of repeated iterations is highly dependent on the problem setup.
    You should definitely use jQuery. It's really great and does all things.

  13. #13
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Repeated k-fold Cross-Validation


    I also wonder, if the repeated component would make the study specific results more reproducible given someone else did not know the seed number!
    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats