# sampling

#### noetsi

##### No cake for spunky
I know the general theory on finding out how many you need for an error range. But I have some questions I had not seen or forgot.

What if you population is limited. We have a population that, depending on how they define it, might be as few as 40,000. Does it matter if the population is a limited size how many you need to sample? Discussion of sample size I have seen don't talk about this.

When do you need to sample by a subdivision. I was asked this today and I responded that you need to sample by a subdivision, area rather than state, if you think an area varies significantly from another area on what you are measuring . But I have not seen that raised in the literature.

Is it the number that you sample or the number that are returned that matters in terms of error in estimating a population. I am pretty sure it is the number answered - but remarkably I can't remember a source that says this.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Didn't completely follow - especially given "number answered". Are you writing about a survey?

You can do sample size calculations to target a certain SE value based on your assumptions about the population. Is this what you are referencing? And you have some type of subgroup you may need to target differently, see your differ from the population as a whole?

#### noetsi

##### No cake for spunky
yes I am doing a survey. I was wondering if the number used to calculate the error was tied to how many you send out or how many you get back. As I guessed it is the number you get back.

We have areas that make up our statewide numbers. I was asked if you had to sample at the area level (so many sent to each area) or statewide. My answer was that it depends on if you believe there are systematic differences between areas on what you were sampling. But I was not sure that was true or not.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
There are definitely sampling scheme to ensure you get some minority groups (that may differ), but If you use these schemes you have to apply sampling weights to the the sample to ensure it represents the population.

#### GretaGarbo

##### Human
The "usual" formula for standard error (SE) is:

SE = sqrt( (sigma^2/n) )

where sqrt means square root.

But if you have a finite population like N = 40 000 then you need to insert the finite population correction (fpc) fpc = (N-n)/N where N is the population size and n is the sample size. Then the SE is:

SE = sqrt( (sigma^2/n)*(N-n)/N )

Notice that when the population is large (like thousands) then the fpc will be close to one. (Like when N is 40 000 and n is 400). That means the you need essentially the same sample size even if you are investigating a small country like Luxemburg or USA or China. (And I guess that that is what was meant on twitter that Elon's lawyers are borderline statistically illiterate, because twitters population is very large. It is not the proportion of the population that matters, it is the sample size n that matters.)

But if you @noetsi have a lot of subgroups, like counties or municipalities then you can do a stratified sample (i.e. a simple random sample from each municipality). If there are differences between the municipalities then you can gain (maybe a lot) precision by stratifieing.

Of course it is the sample size that you get back that matters.

BUT, but but! You have sent out a simple random sample (so that is OK) but you don't know if what you get back will be a simple random sample of those you sent out. So there can be a systematic bias in the sense that it is the more pleased customers or the more angry customers that responds. So, can you really rely on a sample survey with a low respons rate?