Two tailed t-test for data sets that do not follow normal distribution?

#1
Hi there,

I'm doing some work comparing two sets of people: those I invited to an event, and those who actually showed up. I am comparing the amount of money they donated to our charity in the past. I have the amounts bucketed into about five categories (i.e., $100-$199, $200-$299, etc.) and I am showing the percentage of people who fall into each of these categories.

Since my invited set includes about 1,000 people and my actually showed up list is only about 400 people, I want to do some significance testing on the proportions to ensure the differences are statistically significant at the 90% confidence interval.

I used an Excel macro tool I have on hand to perform the two-tailed t-test for proportions assuming equal variances, and the results seemed to make sense (i.e., the percentages I thought would be significantly different at 90% indeed were). However, someone just pointed out to me that neither data set follows the normal distribution.

Does this mean the t-test results are invalid? What test should I be using in its place? I was thinking the Mann-Whitney, but as I was reading about how to conduct it, I don't understand how that test can compare two proportions from two separate groups with different sample sizes.

Advice would be appreciated - thank you!
 

RobH

New Member
#2
I wouldn't bother splitting the amount of money into five categories, that might actually be the reason why you data is non-normal. Just use the actual amount for each person. Assuming that the people who showed up DO NOT also appear in the 'invited' set, you can just do a Mann Whitney, which will tell you whether one group donates significantly more than the other.
 

RobH

New Member
#4
Hi

Not that I'm aware of. You need to remove the overlap otherwise any analysis would be invalid - as you don't know whether any effect found is due to people who turned up also being in the 'invited' set. If you think about it a proper analysis given your hypothesis (which I presume is that people who turned up donated on average a significantly different amount to those who did not) has to involve splitting the groups out properly.

If for some reason you can't split the groups out separately you could still do the analysis, but any results you might get will be tainted by the 'non-independence' of the two groups, and therefore any conclusions may be invalid.