# Significance when self-selected participation is low

#### kvista7

##### New Member
Suppose we have 1000 people. Of those 1000, 4 men and 2 women express interest in option A. Note that each one of the 1000 does not have to render a decision on option A, this is a self-selected scenario where only those who choose to express an interest do so (e.g., someone "like"-ing a Facebook page).

My question is: what can I say about the male/female ratio and how "confident" can I be in that?

For example, I would like to be able to say "men showed twice as much interest in option A" with some details that ensure the reader understands the sample size (or I would like to avoid saying anything if the self-selected population is insufficient).

Thanks very much for any advice.

#### noetsi

##### Fortran must die
The question whether it is important to generalize to the larger population. You can obviously say that 2/3 of those who expressed an interest were of a given gender. You might want to compare this ratio to the real population gender. For example "Two thirds of the sample were men. This compares to the actual population where 55 percent were men." You do this to give the reader a sense of how represenative of the actual population your sample is.

Personally using 7 self selected people to comment on a thousand (if that is what you are doing) does not seem a very good idea to me. It tells you very little in all liklihood of what that population believes.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Like noetsi is inquiring about, is the sample really 7. You could calculate odds ratios with 95% confidence intervals. The odds will tell you if there is a difference, but the CI will let you know if upon repeated measures you would be able to actually say there is a difference (if CI includes 1 or not, which with this sample it would, negating the ability to report a significant difference in odds).

I am out the door so sorry for any lack of clarity or more details.

Hopefully your numbers are just hypothetical, because it is hard to tell a story with so little information.

#### noetsi

##### Fortran must die
One thing you would certainly do is calculate statistical power (which would be awful with 7 cases). You would also, as noted above, calculate a CI and the standard error (which likely would be huge with such a small sample).

#### kvista7

##### New Member
Thanks for the replies. Actually, the only choice I have is *whether* I should suggest any "conclusion" from such small "samples". If you consider the 6 people (4 men, 2 women) as a sample of some larger population, it's not clear to me whether the 1000 is this population; in fact the whole idea of "sample" gets messed up because it's not randomly selected. This is in fact my question -- and I definitely do not want to misrepresent something.

So I guess what I'm asking is: how does one compute when a self-selected sample size is "large enough"? Clearly, confidence bounds and error rate are aspects of this to report, but I'm just not even sure if I'm defining sample and population correctly given the type selection being done here.

#### noetsi

##### Fortran must die
You are dealing with two different issues there. In terms of the size of the sample you can calculate a standard error or CI and show how certain your are of your results. For example polls have margins of error within which you are X percent confident of your results (you chose X, commonly its 95 percent). For example you can say the error is 5 points at the 95 percent confidence interval. But this assumes a random sample.

An entirely different question is self selected samples. Regardless of the size of the sample you can not generalize to a population from a convience sample. The first problem gets at random error, the second problem is really systematic bias (or the possibility it occurs).

#### kvista7

##### New Member
Thanks for the reply. Yes, self-selection is the issue here. So, is there nothing that can be gleaned from this because it's self-selection? I would think there would be something. Can't the problem be cast as actually 3 choices in a self-selection scenarion: (a) yes, (b) no, or (c) decline to state?

Sorry to belabor this, I'm just trying to better understand how to evaluate self-selected samples.