below is an experiment I just did testing whether using a bootstrap I could get better results as with simple repetead sampling. Is there any error in the logic/code? It seems that bootstrapping would just amplify the sampling error without adding any value - what do I miss?

This is actually an R Notebook file, I just did not find a better way to include it here.

Bootstrap with t-tests

Being fascniated by the possibilities of simulations in statistics I would like to test some ideas of using the bootstrap with statistical tests. For illustration I picked the simple case of two-sample t-tests. My question is if we can get a better view of the test results by using the bootstrap. The real-life situation would be, to gather one sample, possibly a small one, and try to make the results more conclusive by bootstrapping.

So, first, let us take two samples from two normally distributed populations that differ in the mean only. According to the sample size calculations, we have a power of roughly 80% to detect a difference. Let us repeat the sampling 1000 times and check the p-value distribution.

Code:

```
Len=1000
pval=numeric(Len)
for(i in 1:Len){
x=rnorm(17, 1, 1)
y=rnorm(17,2,1)
pval[i]=t.test(x,y, alternative="two.sided")$p.value
}
x=hist(pval, breaks=20)
print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
## [1] "Percent of tests with rejected NULL 0.812"
```

Code:

```
#First let us find a sample like this
Len=1000
x=numeric(Len)
y=numeric(Len)
for(i in 1:Len){
x<<-rnorm(17,1,1)
y<<-rnorm(17,2,1)
if(t.test(x,y,alternative="two.sided")$p.value>0.1){ break
}
}
print(sprintf("So, the %d th trial resulted in a p-value of %f", i,
t.test(x,y,alternative="two.sided")$p.value))
## [1] "So, the 14 th trial resulted in a p-value of 0.297243"
```

Code:

```
Len=500
pval=numeric(Len)
for(i in 1:Len){
x1=sample(x,17, replace=TRUE)
y1=sample(y,17,replace=TRUE)
pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
}
x=hist(pval, breaks=20)
print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
## [1] "Percent of tests with rejected NULL 0.206"
```

Let us see what hapens if the -value is much larger, though the difference in the populations is there.

Code:

```
Len=1000
x=numeric(Len)
y=numeric(Len)
for(i in 1:Len){
x<<-rnorm(17,1,1)
y<<-rnorm(17,2,1)
if(t.test(x,y,alternative="two.sided")$p.value>0.3){ break
}
}
print(sprintf("So, the %d th trial resulted in a p-value of %f", i,
t.test(x,y,alternative="two.sided")$p.value))
## [1] "So, the 17 th trial resulted in a p-value of 0.429053"
```

Code:

```
Len=500
pval=numeric(Len)
for(i in 1:Len){
x1=sample(x,17, replace=TRUE)
y1=sample(y,17,replace=TRUE)
pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
}
x=hist(pval, breaks=20)
print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
## [1] "Percent of tests with rejected NULL 0.156"
```

How about false alarms? Let us take two samples that are quite similar, the difference in mean values being 0.2 and repeat the exercise.

Code:

```
Len=1000
pval=numeric(Len)
for(i in 1:Len){
x=rnorm(17, 1, 1)
y=rnorm(17,1.2,1)
pval[i]=t.test(x,y, alternative="two.sided")$p.value
}
x=hist(pval, breaks=20)
print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
## [1] "Percent of tests with rejected NULL 0.108"
```

Code:

```
Len=1000
x=numeric(Len)
y=numeric(Len)
for(i in 1:Len){
x<<-rnorm(17,1,1)
y<<-rnorm(17,2,1)
if(t.test(x,y,alternative="two.sided")$p.value>0.1){ break
}
}
print(sprintf("So, the %d th trial resulted in a p-value of %f", i,
t.test(x,y,alternative="two.sided")$p.value))
## [1] "So, the 3 th trial resulted in a p-value of 0.111358"
```

Code:

```
Len=500
pval=numeric(Len)
for(i in 1:Len){
x1=sample(x,17, replace=TRUE)
y1=sample(y,17,replace=TRUE)
pval[i]=t.test(x1,y1,alternative="two.sided")$p.value
}
x=hist(pval, breaks=20)
print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))
## [1] "Percent of tests with rejected NULL 0.404"
```