I guess it's just a sort of bootstrapping...I solved the problem, by the way. Using an estimation of the confidence intervals and iterating with increasing, randomly selected, sample sizes.
Hi all!
In the last weeks I'm facing the issue of determining if my sample is big enough to describe the world population. Apart from the usual post-hoc power analysis, it came to my mind that I could act in a different way. I describe here my thoughts, while I'm asking myself if the following procedure has a name.
My sample is of 100 subjects.
Each of the 100 subjects shows a particular hair colour:
- 80% brown/black
- 15% blond
- 5% red
How to determine if I can describe the whole world population with my 100 subjects?
I thought to write a code that does the following:
- pick 10 random subjects among the 100
- calculate the percentages
- pick other 10 random subjects among the 100
- calculate the percentages
- ...and so on for, let's say, 50 times
- average the percentages values of the 50 random attempts
- do they differ more than 5% from the initial result? If the answer is NO, then 10 is my number (I can describe the world population with just 10 subjects and all the other 90 are superfluous). If the answer is YES, then I repeat the procedure with:
- 15 random subjects
- 20 random subjects
- ...and so on until I find an asymptotic behaviour to the initial information.
- If no asymptote is found, than I need more than 100 subjects.
What do you think? Is this some common method or is it the first time you see it?
Thank you all so much for your help!
Last edited by ale_tuz; 05-09-2014 at 02:35 AM.
I guess it's just a sort of bootstrapping...I solved the problem, by the way. Using an estimation of the confidence intervals and iterating with increasing, randomly selected, sample sizes.
Have you considered the potential for sampling bias? For example, if your ten came from Japan, it would likely be 100% black hair. From Ireland, the percent of red hair might be higher. To be a true representative of the world, you would need to sample from many regions/countries. This would be difficult with only ten.
You need to consider more that the mechanics of the statistics, but also on the representativeness of the data.
Thank you very much for the kind answer, Miner. The biases are taken into account (I'm not actually measuring hair colour, but some biomechanical features of locomotion), even if not totally. I expect, anyway, to cover a big part of "occidental" population with the final sample of almost 150 subjects. Also, the study must really come to an end! :-D
Thank you, that was a great point.
Tweet |