I am a relative novice at statistics and have run into a problem working out how to analyse some data I will hopefully soon be acquiring. I would appreciate advice.

**The data**

What I want to do is compare the responses of a survey question of two groups of subjects to see if the two groups answer the question differently. The question has the form illustrated below:

"Which of the below are the best letters? Pick THREE options:

A

B

C

D

E

F

G

H"

From what I have read, my "pick three from eight" data can almost be described as categorical "multiple response" data, as in the following resources:

http://www.jaqm.ro/issues/volume-4,issue-1/pdfs/lavassani_movahedi_kumar.pdf, http://link.springer.com/chapter/10.1007/978-3-7908-1813-0_18, http://statistika.vse.cz/konference/amse/PDF/Plasil+Vlach.pdf.

However, the difference between my data and what is described there is that my question forces every subject to pick three and only three answers, whereas those papers look as "pick any number of responses" kind of data.

**The questions**

I have two questions.

First, how should I analyse this data? What I want is essentially a test of independance of two samples of "pick three" multiple response data. My first thought was to use a chi squared as my data is categorical, but further research says that this is not appropriate when there are multiple responses per subject. I suspect I need some kind of corrected chi square test (Rao-scott is a name that keeps coming up), but as none of the resources I have read quite match my type of data I am not sure.

Second, I want to do a sample size calculation to find out how many subjects I would need in each group to detect a given difference - for example, if all of the subjects in group 1 put ABC, and all of the subjects in group 2 put ABD, how many subjects would I need for this result to come out as significant with my chosen test? I have been trying but have no idea how to do this yet, mainly because I don't know what test to use.

I usually analyse data using R, so practical advice tending towards that software would also be helpful, but at the moment I really just want to get the concepts of what I have to do with this data sorted.

Thanks for any help in advance - I have been banging my head against this for some days and have consulted with some more stats-savvy colleages with no luck so far, so I would really appreciate some advice! I will clarify the questions further if needs be.

James