PDA

View Full Version : Sampling error & meaning of level of significance (AKA alpha)



PeterVincent
07-26-2011, 05:59 AM
My understanding of alpha the level of significance is that in a large number of hypothesis tests the null will be rejected percentage alpha times when the null is true, a Type 1 error.

Provided with a bit of time on my hands I set up a simulation for a large number of hypothesis tests where the null is true with the intention of comparing the alpha in the hypothesis tests with the proportion of times the null is rejected (falsely).

I set up my population using 20,000 variates normally distributed with mean mu and standard deviation sigma. I sampled from this with replacement so simulating an infinite population. My sample size was N=100.

I carried out normality tests and satisfied myself that the samples of size 100 were free of outliers and were distributed normally.

I carried out a T hypothesis test using the mean of the sample and the standard error for the sample, alpha for the tests was 5%. I did a sequence of 100 tests in a block and counting the number of rejections and I examined the distribution of sample means and satisfied myself that the means were distributed normally with mean mu and standard deviation sigma over root N.

I carried out the blocks of 100 hypothesis tests 32 times, that is 3,200 tests in total.

My counts of the rejections were: 6, 3, 5, 6, 4, 7, 4, 4, 2, 6, 4, 5, 5, 1, 4, 7, 6, 3, 7, 6, 6, 4, 6, 10, 4, 5, 6, 4, 5, 4, 4, 3. Since there was 100 tests in each block these numbers are also the percentage. Summing these and calculating the proportion of 3200 gives 4.875%, close to the expected alpha of 5%.

What concerns me is the large number of false rejections of the null, 10 in block 24. I believe that this is due to sampling error and not due to bad methods.

While it would be ludicrous to think that the counts would fixed at 5 I did not expect such variability and large range.

I would be very interested in receiving comments.

FYI the population mean was 1000 and standard deviation was 50.

Many thanks,

Peter

BGM
07-26-2011, 07:26 AM
So somehow you may backtest your counts by the Binomial distribution.

10 is not surprising. You may see the following pmf of the Binomial(100, 0.05):

0 5.920529e-03
1 3.116068e-02
2 8.118177e-02
3 1.395757e-01
4 1.781426e-01
5 1.800178e-01
6 1.500149e-01
7 1.060255e-01
8 6.487089e-02
9 3.490130e-02
10 1.671588e-02
11 7.198228e-03
12 2.809834e-03
13 1.001075e-03
14 3.274191e-04
15 9.880016e-05

You can get this by the dbinom in R. Also you may simulate some binomial counts by rbinom to see how many 10 you get.

PeterVincent
07-28-2011, 05:10 AM
So somehow you may backtest your counts by the Binomial distribution.

10 is not surprising. You may see the following pmf of the Binomial(100, 0.05): ...

You can get this by the dbinom in R. Also you may simulate some binomial counts by rbinom to see how many 10 you get.

Thank you BGM for your help.

PeterVincent