Simulation in R to find if a sample is staistically different

#1
Hi, I am having trouble understanding how to approach a simulation:

I have a sample of n=250 from a population of N=2,000 individuals, and I would like to test whether this particular sample has a value Y which is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many) of n=250 from the N=2,000 population and maybe do a t-test between the random samples and the one sample I am trying to prove that is different from any others? Would the t-test be the right way to do it?
For now I only have the simulation as:
df<-
i.e.
x y
1 2
3 4
5 6
7 8
. .
. .
. .
2,000

sample1 <- df[sample(nrow(df), 250, replace=TRUE),]
 
Last edited:

bryangoodrich

Probably A Mammal
#3
So you actually have the population numbers? To run the simulation you need to know the distribution from which you are sampling. In particular, you want to know \(\mu\) and \(\sigma\). Then if \(X \sim N(\mu, \sigma)\), you want to sample repeated from that distribution. If you have the population numbers, you know what those parameters are. While you can sample from the population itself, the simulation is after sampling from the distribution. Thus, you can simply run samples using rnorm with the given parameters. You will compare the statistics of that sample with your population parameters.

A good book that covers these sort of simulation is Navidi's Statistics for Engineers and Scientists. In particular, see the last sections for chapters 3 and 4 I believe (on confidence intervals and hypothesis testing through simulation).
 
#5
Tank you all for the replies. I guess I would need simulations so that I do not make assumptions about the distribution and so that I can test whether this sample is different compared to all others, does that make sense?
 
#6
Hi and sorry if it isn't really clear, I'll try to explain better:


I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know:
(1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R
(2) whether a permutation test or bootstrap methods are more appropriate

This is the data frame that I have, which is to be sampled:
df<-
i.e.
x y
1 2
3 4
5 6
7 8
. .
. .
. .
2,000

I have this sample from df, and would like to test whether it is has extreme values of y.
sample1<-
i.e.
x y
3 4
7 8
. .
. .
. .
250

For now I only have this:

R=999 #Number of simulations, but I don't know how many...
t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation.
for (i in 1:R) {
sample1 <- df[sample(nrow(df), 250, replace=TRUE),]

But I don't know how to continue the loop: do I calculate the mean for each simulation and compare it to the population mean?
Any help you could give me would be very appreciated,
Thank you.