+ Reply to Thread
Results 1 to 5 of 5

Thread: testing the bootstrap

  1. #1
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    testing the bootstrap




    hi,
    below is an experiment I just did testing whether using a bootstrap I could get better results as with simple repetead sampling. Is there any error in the logic/code? It seems that bootstrapping would just amplify the sampling error without adding any value - what do I miss?

    This is actually an R Notebook file, I just did not find a better way to include it here.

    Bootstrap with t-tests

    Being fascniated by the possibilities of simulations in statistics I would like to test some ideas of using the bootstrap with statistical tests. For illustration I picked the simple case of two-sample t-tests. My question is if we can get a better view of the test results by using the bootstrap. The real-life situation would be, to gather one sample, possibly a small one, and try to make the results more conclusive by bootstrapping.

    So, first, let us take two samples from two normally distributed populations that differ in the mean only. According to the sample size calculations, we have a power of roughly 80% to detect a difference. Let us repeat the sampling 1000 times and check the p-value distribution.

    Code: 
    Len=1000
    pval=numeric(Len)
    for(i in 1:Len){
      x=rnorm(17, 1, 1)
      y=rnorm(17,2,1)
      pval[i]=t.test(x,y, alternative="two.sided")$p.value
    }  
    x=hist(pval, breaks=20)
    
    print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
    
    ## [1] "Percent of tests with rejected NULL 0.812"
    This is all nicely according to the theory. Now, let us take the case of a test, that accidentally has a higher then 0.05 pvalue even if the population means differ by 1 unit.
    Code: 
    #First let us find a sample like this
    Len=1000
    x=numeric(Len)
    y=numeric(Len)
    for(i in 1:Len){
      x<<-rnorm(17,1,1)
      y<<-rnorm(17,2,1)
      if(t.test(x,y,alternative="two.sided")$p.value>0.1){     break
      }
    }
    print(sprintf("So, the %d th trial resulted in a p-value of %f", i, 
            t.test(x,y,alternative="two.sided")$p.value))
    
    ## [1] "So, the 14 th trial resulted in a p-value of 0.297243"
    Now, let us see if bootstrapping this one particular sample will give us an idea that this is nothing but a sampling error?
    Code: 
    Len=500
    pval=numeric(Len)
    for(i in 1:Len){
      x1=sample(x,17, replace=TRUE)
      y1=sample(y,17,replace=TRUE)
      pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
    }
    x=hist(pval, breaks=20)
    
    print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
    
    ## [1] "Percent of tests with rejected NULL 0.206"
    So, it seems that in case of an moderately low p-value we can get an indication that there is a difference between the two groups, just by using the bootstrap, without the need of taking a new sample.

    Let us see what hapens if the -value is much larger, though the difference in the populations is there.

    Code: 
    Len=1000
    x=numeric(Len)
    y=numeric(Len)
    for(i in 1:Len){
      x<<-rnorm(17,1,1)
      y<<-rnorm(17,2,1)
      if(t.test(x,y,alternative="two.sided")$p.value>0.3){     break
      }
    }
    print(sprintf("So, the %d th trial resulted in a p-value of %f", i, 
            t.test(x,y,alternative="two.sided")$p.value))
    
    ## [1] "So, the 17 th trial resulted in a p-value of 0.429053"
    Does the bootstrap help?
    Code: 
    Len=500
    pval=numeric(Len)
    for(i in 1:Len){
      x1=sample(x,17, replace=TRUE)
      y1=sample(y,17,replace=TRUE)
      pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
    }
    x=hist(pval, breaks=20)
    
    print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
    
    ## [1] "Percent of tests with rejected NULL 0.156"
    Nope, for extreme samples it will not help. Resampling will be the only remedy here. Interestingly, we got a sample that seems to conclusively demonstrate that there is no difference between the two populations, although there is one. This occurred with a probability of 1% roughly, so actually, I would conclude that small small samples are not really trustworthy, whatever the power calculation.

    How about false alarms? Let us take two samples that are quite similar, the difference in mean values being 0.2 and repeat the exercise.
    Code: 
    Len=1000
    pval=numeric(Len)
    for(i in 1:Len){
      x=rnorm(17, 1, 1)
      y=rnorm(17,1.2,1)
      pval[i]=t.test(x,y, alternative="two.sided")$p.value
    }  
    x=hist(pval, breaks=20)
    
    print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
    
    ## [1] "Percent of tests with rejected NULL 0.108"
    Now let us take two samples where the p-value is about 0.1 and see if we can use the bootstrap to determine that the two populations are in fact similar? The expectation would be to get a lot less tests with a p-value of less then 0.05 then in the case of the actually different populations.
    Code: 
    Len=1000
    x=numeric(Len)
    y=numeric(Len)
    for(i in 1:Len){
      x<<-rnorm(17,1,1)
      y<<-rnorm(17,2,1)
      if(t.test(x,y,alternative="two.sided")$p.value>0.1){     break
      }
    }
    print(sprintf("So, the %d th trial resulted in a p-value of %f", i, 
            t.test(x,y,alternative="two.sided")$p.value))
    
    ## [1] "So, the 3 th trial resulted in a p-value of 0.111358"
    Now let us see the bootstrap results.
    Code: 
    Len=500
    pval=numeric(Len)
    for(i in 1:Len){
      x1=sample(x,17, replace=TRUE)
      y1=sample(y,17,replace=TRUE)
      pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
    }
    x=hist(pval, breaks=20)
    
    print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
    
    ## [1] "Percent of tests with rejected NULL 0.404"
    And no! So the bootstrap will simply see the imbalance in the sampling and is unable to distinguish between samples that come from populations with different means or similar ones - it will only amplify the sampling error.

  2. The Following User Says Thank You to rogojel For This Useful Post:

    GretaGarbo (04-21-2017)

  3. #2
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: testing the bootstrap

    Side note, if I remember correctly the standard sampling approaching are a little bit off, that is why Efron came up with the 0.632+ bootstrap version.
    Stop cowardice, ban guns!

  4. The Following 2 Users Say Thank You to hlsmith For This Useful Post:

    GretaGarbo (04-21-2017), rogojel (04-21-2017)

  5. #3
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: testing the bootstrap

    Hi,
    what is that?

  6. #4
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: testing the bootstrap

    Well it is a type of bias correction version. I know it exists, but haven't used it. See this PowerPoint for a little more information on it along with hypothesis testing.


    http://www.biometrische-gesellschaft...any_Part_I.pdf
    Stop cowardice, ban guns!

  7. #5
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: testing the bootstrap


    This form of the bootstrap is most often used within cross-validation to address that the observations within the folds are not completely independent.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats