Given each option is equally likely, multinomial test.
Hello all.
I have a couple of questions regarding what statistical tests to use for a three-part problem. I would very much appreciate any help. Here is the first part of the problem:
We want to see if people watching action movies in a specific cinema prefer eating pop corn (P) over chocolate (C) while watching the movies. We randomly ask 200 people who come to watch action movies whether they prefer P or C and we get the following data:
Only P: 35 people
Only C: 15 people
Both P and C: 50 people
None: 100 people
TOTAL: 200 people
It seems that indeed more people prefer P but what test we should use to see if there is any statistical significance?
Last edited by newbiestat; 12-28-2014 at 05:22 PM.
Given each option is equally likely, multinomial test.
Stop cowardice, ban guns!
Wouldn't a multinomial test be appropriate if we wanted to test whether all possible 4 answers are equally likely? This is not what the problem is about. We are not interested in seeing whether all four answers are equally likely. We are specifically interested in examining whether there is a statistically significant preference for pop corn vs chocolate.
What about using only data from "Only P" and "Only C" and perform a binomial test on them?
Or perhaps a chi square test for goodness of fit for a 50-50 chance using just "Only P" and "Only C"?
Would such a test violate any statistical logic?
Last edited by newbiestat; 12-28-2014 at 06:07 PM.
What is the origin of this question. Is it academic or personal?
Stop cowardice, ban guns!
Personal. But what difference does it make?
If it was for a course, I did not want you all to getting too far away from what they may have been asking.
Well you only have one group (i.e., action movies) and you want to see if people select one food over the other. I don't know why you would get rid of the other groups. There are rationales for keeping or removing them. Well if each group was equally likely, you can just run four binomial tests with the probability set to 0.25, then you would have a p-value for the probability of having such an extreme result given the null hypothesis of 0.25 was true.
This problem seems way more complex than how it is getting addressed. We don't know if people snuck food in, went to a matinee, have food allergies, shared food, would have bought food but didn't have the money, etc. And there is nothing saying they have to get food. Perhaps you can drop the group of none and state a question of "If a person purchased food, was it equally likely that they purchased one of the three groups. Also, intuitively the combo group seems like it should have a lower probability.
Stop cowardice, ban guns!
trinker (12-30-2014)
Thanks for taking the time to address the problem hlsmith.
Well, the rational behind using only the "Only P" and "Only C" answers is that we are interested in preference for one food over the other. The "Both P and C" and "None" answers do not show such preference so our population is the set of answers which show clear preference. Also, perhaps there is a possible dependence for the first three answers. Isn't there?
We don't know any of the parameters you mentioned (sharing food for example) and you are right that the problem doesn't clarify (my mistake, sorry) that we are not really interested in what the people like to eat but rather in what people actually buy to eat. So the data/answers given by the people basically depend on what they bought. (In other words the term "preference" means both "preference to buy" and "bought"). This means that all the other parameters you mentioned can be ignored I guess.
Finally, why the combo group should necessarily intuitively be lower?. The data is fictitious but I guess people usually like some sweet taste after a salty food.
With your data, you could also try a binomial or Poisson Analysis of Means (ANOM). Just keep in mind that the null hypothesis is that all means = group mean. This is a different null than that of ANOVA.
I guess it seemed intuitive to myself, since I never dish out the money to buy extras.
Stop cowardice, ban guns!
trinker (12-30-2014)
Hobby horse time... I really think it's worth stopping and thinking about what you are asking here. If you did a test of statistical significance for this specific question, the null hypothesis would be:
In the population, the number of people who like to eat popcorn (and only popcorn) when going to the movies is exactly the same as the number who like to eat chocolate (and only chocolate).
And when I say exactly, I mean exactly. That is the null hypothesis you'd be testing. Not that the numbers are pretty close, or approximately the same, but exactly the same. Do you think that if you surveyed the entire population you're interested in (American adults?), that it is remotely plausible that the number who like chocolate only would be exactly the same as the number that like popcorn only? Even though the number of people in both categories would be tens of millions? If not, why test this hypothesis?
All a significance test would allow you to do is to test this null hypothesis. It wouldn't tell you whether the difference is large, or practically significant.
It seems to me that significance testing won't address the questions that are likely to be of interest to you. Perhaps you're interested in how confident you can be about the hypothesis that in the population, the proportion who prefer popcorn only is greater (rather than smaller than) the proportion who prefer chocolate only? Or maybe you'd like an estimate of how big the difference in proportions is, with an interval around the estimate to show the size of uncertainty that is present?
trinker (12-30-2014)
CowboyBear,
Why complicate things...
The population I am interested in is exactly the people who come (or will come) to the specific cinema and watch action movies. I surveyed 200 people which I assume they are representative of the population. From those 200 people I surveyed 35 bought P only, 15 bought C only and 50 bought P and C (100 of them bought nothing).
To simplify things (and to eliminate possible bias due to money concerns) say that I am the one who sells the products and that the price of each one of them is $1. So, from these 200 people I got $85 form P's and $65 from C's. There is a $20 difference. I want to know if this difference is statistically significant.
In other words, I want to know if the money I will make from P's are going to be higher than the money I will make from C's in the long run with an alpha of say 5%. At this stage I am not interested in how large the difference is expected to be. I just want to know if the difference found in my sample is statistically significant.
I think you are getting stuck on the statistically significant part. CB is correct in that a comparison would be an exact comparison of the same proportion. Or you could set this up like an algebra problem and see how many of one item must be sold to have a higher value of than the other.
Another option might be to just slap 95% confidence intervals for proportions on your values, that would be the closest thing to what you are desiring.
Stop cowardice, ban guns!
If I understand correctly you have the following cross table:
Like C
Yes No
Like P Yes 50 35
No 15 100
Wouldn't then a McNemar test be useful?
Last edited by blubblub; 12-30-2014 at 10:55 AM.
blubblub please restate what you think the binary groups would be for using a McNemar.
Stop cowardice, ban guns!
If I'm not mistaken the McNemar checks if the percentage of those who changed in one direction is significantly different than those who changed in the other direction. In this case, if a significant number of people 'changed' from liking something to 'not liking' something else. If McNemar is significant it would indicate a significant number of people like C.
hlsmith (12-30-2014)
Tweet |