I'm sorry if this is a basic question and I'm also sorry for my cumbersome problem description. I guess I'm lacking some serious fundamentals right here and I'm at a point where I need some external kick-off.

Following Scenario:

We are employing method A and method B onto test system v, w, x, y, z... . Getting data points is quite resource intensive so we only have few (in this case 5) data points. We get a mean value of 1.00 (with a standardard deviation of 0.10) for method A and a mean value of 1.04 (as well with a standard deviation of 0.10). Obviously p-value testing would tell us that we can't say that the mean value of method B is significantly higher than for method A. So we move on and take another five data points... . For the whole 10 data points the mean value for method A is still 1.00 (with a standardard deviation of 0.10) and for method B still 1.04 (with a standardard deviation of 0.10). p-value would still tell us that we can't say that the mean value of method B is significantly higher than for method A. But intuitively you would except this statement to be more true after 10 data points... But how would a statistican account for this? ]]>

I obtain an estimate of b<0.

Prior researchers running the a similar regression obtain an estimate of b>0: I'm trying to reconcile this difference.

The only methodological difference driving our results is the following:

In my study, both y and x are continous variables.

The prior researcher uses ranked data: y ={1,2,3,4,5} and x={1st,2nd,..... nth}.

My question is: can the difference in methodology explain the difference in estimates for b (sign change)? If so, how?

Many thanks to anybody who comments ]]>

below is an experiment I just did testing whether using a bootstrap I could get better results as with simple repetead sampling. Is there any error in the logic/code? It seems that bootstrapping would just amplify the sampling error without adding any value - what do I miss?

This is actually an R Notebook file, I just did not find a better way to include it here.

Bootstrap with t-tests

Being fascniated by the possibilities of simulations in statistics I would like to test some ideas of using the bootstrap with statistical tests. For illustration I picked the simple case of two-sample t-tests. My question is if we can get a better view of the test results by using the bootstrap. The real-life situation would be, to gather one sample, possibly a small one, and try to make the results more conclusive by bootstrapping.

So, first, let us take two samples from two normally distributed populations that differ in the mean only. According to the sample size calculations, we have a power of roughly 80% to detect a difference. Let us repeat the sampling 1000 times and check the p-value distribution.

Code:

`Len=1000`

pval=numeric(Len)

for(i in 1:Len){

x=rnorm(17, 1, 1)

y=rnorm(17,2,1)

pval[i]=t.test(x,y, alternative="two.sided")$p.value

}

x=hist(pval, breaks=20)

print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))

## [1] "Percent of tests with rejected NULL 0.812"

Code:

`#First let us find a sample like this`

Len=1000

x=numeric(Len)

y=numeric(Len)

for(i in 1:Len){

x<<-rnorm(17,1,1)

y<<-rnorm(17,2,1)

if(t.test(x,y,alternative="two.sided")$p.value>0.1){ break

}

}

print(sprintf("So, the %d th trial resulted in a p-value of %f", i,

t.test(x,y,alternative="two.sided")$p.value))

## [1] "So, the 14 th trial resulted in a p-value of 0.297243"

Code:

`Len=500`

pval=numeric(Len)

for(i in 1:Len){

x1=sample(x,17, replace=TRUE)

y1=sample(y,17,replace=TRUE)

pval[i]=t.test(x1,y1,alternative="two.sided")$p.value

}

x=hist(pval, breaks=20)

print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))

## [1] "Percent of tests with rejected NULL 0.206"

Let us see what hapens if the -value is much larger, though the difference in the populations is there.

Code:

`Len=1000`

x=numeric(Len)

y=numeric(Len)

for(i in 1:Len){

x<<-rnorm(17,1,1)

y<<-rnorm(17,2,1)

if(t.test(x,y,alternative="two.sided")$p.value>0.3){ break

}

}

print(sprintf("So, the %d th trial resulted in a p-value of %f", i,

t.test(x,y,alternative="two.sided")$p.value))

## [1] "So, the 17 th trial resulted in a p-value of 0.429053"

Code:

`Len=500`

pval=numeric(Len)

for(i in 1:Len){

x1=sample(x,17, replace=TRUE)

y1=sample(y,17,replace=TRUE)

pval[i]=t.test(x1,y1,alternative="two.sided")$p.value

}

x=hist(pval, breaks=20)

print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))

## [1] "Percent of tests with rejected NULL 0.156"

How about false alarms? Let us take two samples that are quite similar, the difference in mean values being 0.2 and repeat the exercise.

Code:

`Len=1000`

pval=numeric(Len)

for(i in 1:Len){

x=rnorm(17, 1, 1)

y=rnorm(17,1.2,1)

pval[i]=t.test(x,y, alternative="two.sided")$p.value

}

x=hist(pval, breaks=20)

print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))

## [1] "Percent of tests with rejected NULL 0.108"

Code:

`Len=1000`

x=numeric(Len)

y=numeric(Len)

for(i in 1:Len){

x<<-rnorm(17,1,1)

y<<-rnorm(17,2,1)

if(t.test(x,y,alternative="two.sided")$p.value>0.1){ break

}

}

print(sprintf("So, the %d th trial resulted in a p-value of %f", i,

t.test(x,y,alternative="two.sided")$p.value))

## [1] "So, the 3 th trial resulted in a p-value of 0.111358"

Code:

`Len=500`

pval=numeric(Len)

for(i in 1:Len){

x1=sample(x,17, replace=TRUE)

y1=sample(y,17,replace=TRUE)

pval[i]=t.test(x1,y1,alternative="two.sided")$p.value

}

x=hist(pval, breaks=20)

print(sprintf("Percent of tests with rejected NULL %.3f", x$counts[1]/Len))

## [1] "Percent of tests with rejected NULL 0.404"