Help on Chi-Square Statistics

#1
Hi, I have an experiment for my masters thesis and I’m confused about which statistical test I should use.
In my experiment I’m looking if there is an association between food types and locations. My variables are food type (with 4 levels) and location (with 8 levels).
The experiment has 3 parts. In part 1 I will ask participants to place each food picture into 8 possible boxes as they wish. In part 2 they will rate each picture in terms of healthiness and calories, and learn the actual caloric values. In part 3 they will do the placement task once more, to see if learning the actual values change anything in the placement.
I have two objectives: see if each food type is likely to be placed in certain locations more than the others, and compare the before and after placements (part 1 and part 3) to see if learning about the values changes placement.
I thought I should use chi-square because I’ll be comparing frequencies of placement. However, I’m confused about which chi-square stat I should be using.
I thought I should use chi-square test of independence for part 1 and part 3 separately. I’d have a 4x8 contingency table with placement frequencies. For comparing part 1 and part 3 I could use goodness of fit, using part 1 as expected values and part 3 as observed values and compare how good they fit each other.
However, someone commented that I can’t use test of independence because once an item is placed in a box, no other item can be placed in that box. This creates a dependency between food type and locations. They suggested that I can run a 1x8 independence test for each food type. However, if I do that I face an issue with degrees of freedom. According to degrees of freedom calculation df = (rows -1) x (columns – 1). If I do a 1x8 for 4 food types, then my degrees of freedom become 0.
In short, I wonder if which statistical test I need to use in order to do this experiment. I hope I made it clear. Thank you so much in advance!
 

Attachments

katxt

Well-Known Member
#2
This looks like an interesting problem without an obvious standard solution.
It isn't clear yet just what the subjects have to do or what you mean by "location". Is there a "correct" answer? Can you give us an idea as to what the data will look like for parts 1 and 3. Just make up a few rows/subjects if you haven't got the data yet. How will you know from the data if they have learnt anything?
 
Last edited:
#3
Hi katxt, thanks for your answer.
By location I mean the location of those 8 boxes, which are top-left, top, top-right, right, bottom-right, bottom, bottom-left and left. This is an exploratory study so there's no correct answer to where the participants need to place items. I want to see if certain food types are more likely to be represented in certain locations. For example, if low-calorie healthy food is significantly placed on the top-left box compared to others, I can talk about an association of location and food type for low-calorie foods and top-left location.
Below I'm posting a contingency table mock data for part 1 or part 3 (both will look similar only with different values). This table shows one participant's data with 25 trials (Since each trial consists of 4 images for each placement task, it adds up to 100 observations in total). Each row shows the frequency of placement for that food type.
For part 1 and part 3, I want to see if certain locations are significantly used more for each food type. I will do this for both part 1 and part 3 separately. Later, I will compare part 1 and part 3 and see if there is a change. For example, if NaLow food type is significantly placed in the top position in part 1, I want to compare if it's the case in part 3 as well or if it's represented in another position. The point of this is, that sometimes people don't know the actual calories of foods and I want to make sure if the positioning changes after they learn the actual caloric values. Again, there are no right or wrong answers.
I hope this makes it clearer.
Thanks again, Beth
1662196223234.png
 

Attachments

katxt

Well-Known Member
#5
They suggested that I can run a 1x8 independence test for each food type. However, if I do that I face an issue with degrees of freedom. According to degrees of freedom calculation df = (rows -1) x (columns – 1). If I do a 1x8 for 4 food types, then my degrees of freedom become 0.
This should work for parts 1 and 3 separately. You do four separate 8x1 chi sq goodness of fit tests. df = 7. The (rows -1) x (columns – 1) doesn't apply here.
1662261149137.png
The expected values should be at least 5 but 3.125 is probably all right. You can do some version of Yates' correction if you want to. Probably also a good idea to use a Bonferroni correction for multiple p values. Say a cutoff of p = 0.01 instead of 0.05
Comparing Part 1 and part 2 is problematic because the data is paired which is not allowed with chi square. There are ways around this in some circumstances but I will have to have another think about it. kat
 

katxt

Well-Known Member
#7
If you haven't collected the data yet, it would make things much easier for your analysis if you had only four locations, and subjects could put amore than one food in each location.
 
#8
I haven't collected it yet, and I agree it would definitely lighten the analysis but I specifically want to study all 8 locations because no other study had done it before.
 

katxt

Well-Known Member
#9
OK. Not knowing the context, it all seems a bit mysterious to me.
Now, to compare part 1 with part 3 to see if subjects tend to pick a different location after part 2, or perhaps the same, or maybe it's just random. The data are paired so here is a simple idea which should work.
Do one food type at a time.
Say you have 40 subjects and consider type NaLow. We have 40 pairs of locations. Each subject will either use the same location in both parts or a different one. Record this in a third column. Count how many "Same" and how many "Different" in the 40 subjects.
Now we can do a goodness of fit test as in post #5. If there is no pattern, you would expect 40/8 = 5 to be the same by chance, and so 35 different. Excel does this easily using =CHITEST(observed,expected) or you can do the full calculations with 1 df. For example, if there were 8 out of the 40 were the same you would conclude that there is no evidence of a pattern.
1662413732061.png
 
#10
OK. Not knowing the context, it all seems a bit mysterious to me.
Now, to compare part 1 with part 3 to see if subjects tend to pick a different location after part 2, or perhaps the same, or maybe it's just random. The data are paired so here is a simple idea which should work.
Do one food type at a time.
Say you have 40 subjects and consider type NaLow. We have 40 pairs of locations. Each subject will either use the same location in both parts or a different one. Record this in a third column. Count how many "Same" and how many "Different" in the 40 subjects.
Now we can do a goodness of fit test as in post #5. If there is no pattern, you would expect 40/8 = 5 to be the same by chance, and so 35 different. Excel does this easily using =CHITEST(observed,expected) or you can do the full calculations with 1 df. For example, if there were 8 out of the 40 were the same you would conclude that there is no evidence of a pattern.
View attachment 4328
This makes so much sense. Thank you Kat so much you've been so helpful!