interesting sampling question

#1
I have two samples of the same population and wish to reduce the sample sizes, but maintain the same ratio of precision produced in the original samples. I suspect I can take the square root of the original sample size, but this produces very small samples. How, in short, can I scale down the original samples, but maintain the same ratio of precision? Any advice (and a formula!) would be much appreciated.

Example: Original sample sizes for the same population (population N is c. 20,000) N for sample A = 2537; N for sample B = 520

How do I reduce sample sizes while maintaining the same relative precision? Gary Marks Burton Craige Dist. professor, UNC-Chapel Hill
 

hlsmith

Omega Contributor
#2
Before I reply i wanted you to define what you mean by "precision". Typically this is reflective of sample size in confidence interval and is related to random error.
 
#3
Yes, I think this comes down to the same ratio across the two samples of confidence intervals. But a more detailed explanation might be useful to explain my project.

I have two political parties, party A and party B, with contrasting distributions of males/females and whites/greens. Party A has 1000 members, and party B has 2000 members. Party A has 600 males and 700 whites. Party B has 700 males and 750 whites.

I wish to produce a statistic that will allow me to compare the probability that party A can have its particular composition of males and greens compared to the probability that party B can have its particular composition of males and greens.

So I bootstap 1000 samples from the common population. The samples for party A have an N=1000. The samples for party B have an N=2000. Then I count the number of samples that meet the criteria for males and whites.

However, the probability of getting these particular compositions happens to be very small indeed. I would need a *very* large number of samples to get one positive result.

So I wish to use smaller samples! But how can I maintain the relative probability of a positive result across party A and party B? This is my question.

Many thanks for any help,
Gary Marks






concerned with producing the same ratio of confi
across the samples of the distribution of a particular condition.
 

Dason

Ambassador to the humans
#4
Can you elaborate on this:
I wish to produce a statistic that will allow me to compare the probability that party A can have its particular composition of males and greens compared to the probability that party B can have its particular composition of males and greens.
because I'm not really sure what you're trying to say with it. What is your research question? Don't try to jazz it up to make it what you think it should be statistically - what are you trying to learn?
 

hlsmith

Omega Contributor
#5
Also, what is a positive result?


What was wrong with your bootstrap (i.e., with replacement)? Did it not have the exact distributions of m/f and w/g? And as Dason asked, what is the purpose, since due to sampling distributions a smaller sample of the larger population or sample will have variability!!!


Side note, did you have sampling with replacement or not? Unsure of your purpose, and if the following is correct approach for the purpose, but if you want a smaller sample with the exact same distributions and only unique subjects; just bootstrap (BS) without replacement each subgroup (e.g., BS group 1 male, green; BS group 1 male, white; BS group 1 female, green,..., BS group 2 female, white), with sample size being what ever you desire.
 
#6
Thank you for the interest.

I am seeking a statistic that estimates the extent to which a political party has a distinct social basis (i.e. supporters who male/female, highly/less educated, rural/urban, professional/manual). I dichotomize these variables, and wish to estimate the likelihood of meeting or exceeding the threshold for a party. A positive result is a sample that meets or exceeds the threshold. The percentage of positive results is a statistic that estimates the deviation from randomness of that party.

So I bootstrap with replacement from the opinion survey. The sample size for a party is the number of respondents who support that party. The population is the number of respondents in the survey.

However, when I sample 10,000 times (with replacement) the number of positive results is zero.

So I can reduce the sample size. The smaller the sample, the greater the odds of a positive result. But on what basis do I reduce the sample size? This is my question. Thank you for any ideas/help on this!

Admittedly, this requires thinking outside the normal box.

Gary Marks
 
#7
Sorry, I (and others) can still not understand what you are trying to do.



Admittedly, this requires thinking outside the normal box.
No, please try to still be thinking inside the box. That is when you are on safe ground. Then there are proofs and investigations that the methods works. It is when you try to be "creative" that you are in danger. When you invent something that there is no base for.


The sample size for a party is the number of respondents who support that party. The population is the number of respondents in the survey.
Normally - in statistics - the populations is the group that you are sampling from (often millions of people from a country) and it is the group that you want to generalize your conclusions to. The sample is the respondents. Here there seems to be a two stage sampling procedure, select the respondents, select the subjects that belong to a party. (Please try to use standard statistical denotation.)


I am seeking a statistic that estimates the extent to which a political party has a distinct social basis (i.e. supporters who male/female, highly/less educated, rural/urban, professional/manual).
I dichotomize these variables ....
Please don't dichotomize. It is a bad procedure. You will lose information and in specific models you will get biased results.

Are you saying that when you have dichotomized the data and get a multi way cross table (a multi way contingency table) that there a some cells that there is no observations?

I would guess that there is an easier way to do this.