Limited multiple response data - analysis and sample size calculation


New Member
Hello all,

I am a relative novice at statistics and have run into a problem working out how to analyse some data I will hopefully soon be acquiring. I would appreciate advice.

The data

What I want to do is compare the responses of a survey question of two groups of subjects to see if the two groups answer the question differently. The question has the form illustrated below:

"Which of the below are the best letters? Pick THREE options:

From what I have read, my "pick three from eight" data can almost be described as categorical "multiple response" data, as in the following resources:,issue-1/pdfs/lavassani_movahedi_kumar.pdf,,

However, the difference between my data and what is described there is that my question forces every subject to pick three and only three answers, whereas those papers look as "pick any number of responses" kind of data.

The questions

I have two questions.

First, how should I analyse this data? What I want is essentially a test of independance of two samples of "pick three" multiple response data. My first thought was to use a chi squared as my data is categorical, but further research says that this is not appropriate when there are multiple responses per subject. I suspect I need some kind of corrected chi square test (Rao-scott is a name that keeps coming up), but as none of the resources I have read quite match my type of data I am not sure.

Second, I want to do a sample size calculation to find out how many subjects I would need in each group to detect a given difference - for example, if all of the subjects in group 1 put ABC, and all of the subjects in group 2 put ABD, how many subjects would I need for this result to come out as significant with my chosen test? I have been trying but have no idea how to do this yet, mainly because I don't know what test to use.

I usually analyse data using R, so practical advice tending towards that software would also be helpful, but at the moment I really just want to get the concepts of what I have to do with this data sorted.

Thanks for any help in advance - I have been banging my head against this for some days and have consulted with some more stats-savvy colleages with no luck so far, so I would really appreciate some advice! I will clarify the questions further if needs be.

Hi James. This problem has a similar structure to a multivariate anova - a column containing the group as the predictor, then columns A to H containing the responses. The main differences are that the responses are 0 and 1, not normal numbers, and the number of 1's for each subject is fixed at 3. However, you might like to try a Monte Carlo manova type approach. If you go this way, you will need some measure of how different the two groups are. One possibility is, for each response calculate (average of group1 - average of group2)^2 and add these to get a difference score for the group. This is more or less the sort of thing a manova does. The bigger the score, the more different the groups. But how big does a difference score need to be to be significant. You can't use the F distribution like the manova, so you need to do a permutation test and make your own null sampling distribution. Shuffle the rows of responses without replacement leaving the groups unchanged (or vice versa) working out the difference score for a few thousand random permutations. If your data's score is in the top 5% of the scores from the random shuffles, then you can declare significance. I personally would do this in Excel, but I'm sure R could be programmed to do it well.
The sample size question might have to wait for another day.