Calculation of expected values for chi-squared test


I need to work out the probability that a set of survey results occurred by chance, and I intend to use a chi-squared test. However, I haven't studied mathematics formally since I was 16, and I need someone to confirm whether my calculations of expected values are accurate and, if not, to help me correct them. Please note that this request is purely about mathematics: I'm not considering matters like ordering effects at the moment.

The survey was taken by 200 hundred participants, each of whom was asked two questions. Question 1 was asked of all participants, and they were required to select either 'Yes' or 'No'. Question 2a was asked of those who had selected 'Yes' for question 1, and question 2b was asked of those who had selected 'No'; they were required to select at least one option but were allowed to select more than one, in any combination. Here's the structure:

Question 1: Yes or No?
Answer options: Yes; No

Question 2a: Why?
Answer options: A1; A2; A3; B1

Question 2b: Why not?
Answer options: A4; A5; A6; A7; B2; C; D

Now, the expected values I need to calculate are the numbers of participants who would select:

- only A's (i.e. any combination of A1, A2, etc., with no other options selected)
- only B (i.e. only B1 for question 1a, or only B2 for question 2b)
- only C
- only D
- any mixture of the above (henceforth 'mixed')

So first I've calculated the numbers of possible outcomes per question, per participant as follows:

Question 1: 2
Question 2a: (4^2)-1=15 (there are four options, each of which can be in either of two states, ticked or unticked, but the outcome in which they're all unticked is disallowed)
Question 2b: (7^2)-1=48 (there are seven options ...)

Next I've counted the numbers of possible favourable outcomes per question, per participant:

Question 1, Yes: 1
Question 1, No: 1

Question 2a, only A's: 7 (A1; A2; A3; A1&A2; A1&A3; A2&A3; A1&A2&A3)
Question 2a, only B: 1
Question 2a, mixed: 7

Question 2b, only A's: 15 (A4; A5; A6; A7; A4&A5; A4&A6; A4&A7; A5&A6; A5&A7; A6&A7; A4&A5&A6; A4&A5&A7; A4&A6&A7; A5&A6&A7; A4&A5&A6&A7)
Question 2b, only B: 1
Question 2b, only C: 1
Question 2b, only D: 1
Question 2b, mixed: 30

Then, based on the above, I've calculated the probabilities of favourable outcomes per participant:

Only A's: ((1/2)*(7/15))+((1/2)*(15/48))=0.38958
Only B: ((1/2)*(1/15))+((1/2)*(1/48))=0.04375
Only C: (1/2)*(1/48)=0.01042
Only D: (1/2)*(1/48)=0.01042
Mixed: ((1/2)*(7/15))+((1/2)*(30/48))=0.54583

Finally, I've multiplied each of the above probabilities by the number of participants to give the expected values, i.e. the number of participants expected to select each option or combination of options that I'm interested in:

Only A's: 0.38958*200=77.916
Only B: 0.04375*200=8.75
Only C: 0.01042*200=2.084
Only D: 0.01042*200=2.084
Mixed: 0.54583*200=109.166

So have I got this right? I'm happy to give further explanations of my reasoning at any stages of the above.


Last edited: