Can I increase the sample size by generating random numbers to apply the Chi-Square Goodness of Fit Test?

#1
Does increasing the sample size by random number generation change the distribution?

I have a sample of size 8. Each sample value represents the number of bus arrivals at a bus stop every 15 minutes. But I wanted to apply the chi-square test to verify the fitting with the Poisson distribution. So, for every 15 minute interval, I generated 15 random numbers. So I got a new sample size 120.

The numbers were generated following a uniform distribution. See an example:

I had the following sample size 8:

A={8, 13, 13, 14, 15, 11, 16, 11}

My new sample is:

B={0, 1, 0, 1, 2, 2, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 2, 0, 1, 2, 3, 0, 1, 2, 0, 0, 0, 2, 1, 0, 1, 1, 2, 1, 0, 0, 0, 1, 1, 1, 0, 2, 0, 1, 2, 0, 2, 1, 3, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 2, 1, 1, 2, 2, 0, 2, 1, 1, 0, 0,,1, 1, 1, 1, 2, 0, 2, 0, 0, 2, 0, 0, 2, 0, 1, 0, 0, 1, 2, 0, 3, 0, 3, 2, 0, 2, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 2, 0}

Notice that the sum of the 1º to 15º is equal to 8, the 16º to the 30º equals 13 and so on.
I would like to know if the distribution of A sample always will be equal to the distribution of B sample, for any random sample that I generate this way.

How much does this changes the characteristics of the system?
 
#3
Hey @Pedro Humberto -

I didn't follow why you are doing this and what you meant in the 1 degree to 15 degree part???
Hey @hIsmith


I'm doing this because the chi-square test requires a sufficient sample size in order for the chi-square approximation to be valid.

Am I right?

1 degree--> First element of the sample B
15 degree --> fifteenth element of the sample B

0+1+0+1+2+0+0+0+0+1+0+0+0+1=8 ---> (8 is the first element of the sample A)
1+ 0+ 1+ 0+ 2+ 0+ 1+ 2+ 3+ 0+ 1+ 2+ 0+ 0+ 0=13--> (13 is the second element of the sample A)
...

1+ 1+ 1+ 1+ 1+ 1+ 1+ 0+ 0+ 0+ 0+ 1+ 1+ 2 + 0 = 11--> (11 the last element of the sample A )
 
Last edited:
#4
My code in R to the generation of random numbers:

A<-c(8, 13, 13, 14, 15, 11, 16, 11)

n<-15
B<-c()
for (j in 1:length(A)) {
arrival<-vector("numeric", length = 15)
sum<-A[j]
for (i in 1:sum) {
home<-abs(floor((runif(1,1,(n+0.99999)))))
arrival[home]<-arrival[home]+1
}
B<-c(B,arrival)
}
print(B)
 
Last edited:
#8
Hi Pedro,

So your sample size is 101... and each group has more than 5 observations.
The test power for medium effect size is 0.56 and 0.98 for large effect size. So if you want to know the distribution as a base assumption for future calculations I think it is okay for any practical use. (http://www.statskingdom.com/30test_power_all.html)

Thank you very much obh!

Thank you! I'm applying on queuing theory.
I'm checking to see if I can apply the Markovian model.