How to prove that a sample is statistically significant?


I've been searching all over the internet for a formula or a test that will help me understand if my sample (41 answers) is statistically significant of the population (78 people), but I'm getting a lot of different formulas and all of them give me different results. I was wondering if anyone could tell me which formula should I use, since I'm not comparing percentages or trying to calculate the optimal sample size.

Thanks in advance for your help


Less is more. Stay pure. Stay poor.
You need to provide more details. Are you trying to see if the sample is representative of the population. If so, what variables are you comparing and how are they formatted?
Thanks for your reply. I guess what I'm trying to do is exactly that, see if the sample is representative of the population. It's a satisfaction questionnaire regarding the admission experience of a particular MBA program. Out of the 78 applicants that submitted the application (the total population), 41 of them answered the survey (my sample), what I wanted to know is if I can infer conclusions from that sample.
I'm really inexperienced at this, as you probably have noticed, so I don't really understand what you mean with the variables and with how they are formatted.
Thanks once again for your help.


Less is more. Stay pure. Stay poor.
Well typically a researcher will want to ensure those that completed the survey are comparable to the population, in order to potentially generalize. You would then select variables of interest that may affect survey results and display that both groups are the same, in order to rule out dissimilarities. Perhaps demographics fit this need. So you compare the ages, race, grades, etc. between the sample and the population to show there are no differences. So for comparing ages (continous variable) you may use ttests or Wilcoxon rank sums and for race (categorical) you may use chi-square or Fisher's exact test. So how the data are formatted dictates the tests you may use to compare the sample to the population.

Lastly you may also need to run some power or sample size calculations to ensure that you have a large enough sample to ensure that if there was a difference, that your statistical test would be powered sufficiently to discern this.
Most surveys have error ranges called margin of error in polls. These generate confidence intervals around the sample mean - the true population mean should fall within that error range (on either side of your sample mean). Software exist telling you how high your sample has to be for a given confidence level to generate a given margin of error.

A seperate issue, which can not be dealt with by statistics, is insuring your sample is representative of your population. This is tied to research design (aka design of experiment) and addresses issues like having a reasonable sampling frame, the way you sample, the nature of the population you sample etc. To address this you need to look at a book on survey design.
Hmmm... the problem with this is that since we actually know the 78 applicants since we interviewed them when they where being admitted to the program, the survey was made anonymous. I don't have any information of the 41 who have answered other than I know that they went through the whole admissions process.
Unless you selected your sample at random, which it does not appear you did, then you can't claim it generalizes to a larger population. Analysis without random sampling is done all the time, they call it convenience sampling, but you have to address why you believe it has general application (or just say you were not interested in general application and that future research needs to extend what you did).


Less is more. Stay pure. Stay poor.
Along that same vein, otherwise you do not know why some individuals opted not to participate and whether their responses would have been similar or different to those who completed the survey.

I always think of the extreme, those who opted to fill out a survey may be those that loved or hated the provided service, and is everyone else represented or not.
Well, I guess that solves it then. I read a little bit more about convenience sampling and it fits this case indeed. It's easy to gather up arguments for why the sample doesn't need to be generalized to the whole population, when you're talking about really expensive "goods" like MBAs, each feedback is relevant and should be put into consideration. From what I could gather, there is also no specific feedback contradicting another person's feedback, so I guess I have my argument here. Thanks noetsi and hlsmith for all your help and insights, they were really helpful =)