Help with R code to produce proportions of sample means above one sigma

nowiamcomstock

New Member
Hi everyone, I'm struggling with the following question for an R assignment:

Demonstrate understanding of the Central Limit Theorem, using R, by showing how the distribution of the sample mean changes according to sample size.
Consider a Poisson distribution with λ = 1.5.
Generate samples of 10,000 means over different numbers of observations (eg give a matrix 1, 2,3...100) rows. For each of these samples of means, compute the mean of the means, the sample standard deviation of the means, and the proportions of means that are more than 1 standard deviation above the overall mean.
Generate plots of each of these quantities vs the number of observations contributing to the means (1, 2, 3, 4 etc.).
Write R code used to produce these data. Form conclusions about what is seen, based on the Central Limit Theorem.

I have written the following code, which I think satisfies the question, but can't demonstrate the expected behaviour of the proportions of means above 1 standard deviation of the overall means.

for (n in 1:100){

dist = rpois(10000*n, lambda)
matrix <- matrix(dist, nrow=n)
sample_means = apply(matrix, 2, mean)
sample_sd = sqrt(var(sample_means))

method1 = pnorm(sqrt(lambda/n), lambda, sqrt(lambda))

method2 = (sum(sample_means > (lambda + sqrt(lambda/n))))/10000

method1list <- c(method1list, method1)

method2list <- c(method2list, method2)
}

I've come up with two methods: (method2), which shows the proportion converging to ~16% (which is to be expected as the distribution of the sample means more closely resembles a normal distribution as sample size increases?). (method1) shows the probability, which decreases as expected, but trends downward to a value of ~9%, much lower than expected. The convergence to ~16% is expected due to the empirical rule.

Also, when plotting method2 against n observations, and abline(h=pnorm(-1)), it shows the generated values scattered around the correct proportion, with variance from this line decreasing as sample size increases.

However, I'm not sure that either of these methods are correct? Is there something that I am missing? If they're not, how do you feel I need to alter my code to obtain the correct proportions of means above 1sd from the overall mean for each sample?

Thank you in advance and apologies if I have broken any formatting/etiquette rules. I am a novice to both stats and R.