ANOVA - separate for subjects and items, and how to aggregate

#1
Hello everyone,

I have a basic statistics exam coming up and while studying found the instructor's notes to be a little short in one area.

In the field of linguistics, separate ANOVAs for subjects and items are fairly common to get results that are generalisable for people as well as stimuli.

However, all she wrote about it is (and that's the part I don't understand) that for the VP analysis, the mean per subject and independent variable is calculated (aggregating over items), while for the item analysis, the mean per item and independent variable is calculated (aggregating over subjects). Surely it should be the other way around?

So I guess my question is quite general: how do you aggregate something over items vs over subjects and why is this flipped for the analyses? It's been surprisingly difficult to find anything about this - seems that most people (justifiably) assume that you'd cover that in your intro to statistics class. Which is basically the class I'm having trouble with...

Sorry if the terminology is a bit off, it's translated from German.

Any help would be much appreciated!
 

obh

Well-Known Member
#2
Hi John,

I don't think I would use the word aggregating, it more like changing.
If you check the same subject several times, it calls repeated measure ANOVA, then it will be at a different time ..
Like week1 week2 week3, or before treatment and after treatment
 
#3
Hi John,

I don't think I would use the word aggregating, it more like changing.
If you check the same subject several times, it calls repeated measure ANOVA, then it will be at a different time ..
Like week1 week2 week3, or before treatment and after treatment
Hi,

thanks for the reply! I think I may not have phrased that ideally so let me try again, sorry for that!
Repeated measures is a different topic (though it applies there, too). However, this issue goes for designs without repeated measures as well.

It's definitely about aggregation, and I mostly understand how aggregation works (I think), i.e. calculating, for example, the average response times of Subject1 for all items, those of Subject2 for all items, etc.; and then calculate the average response times of all subjects for Item1, for Item2, etc. (if you measure the participants' reaction times, of course).

So to rephrase my question: why do that, and which one is which? Which of them is aggregating over subjects and which is over items, and why do I need to aggregate over items for the subject analysis and over subjects for the item analysis? Or did she just mix those up? Seems perfectly possible as well...
 

obh

Well-Known Member
#4
So when you say aggregate, you talk about how to do the sum of squares?
How can you aggregate a subject over time if it is not a repeated measure? or do you mean aggregate in the level of the factor across the subjects?
 
#5
Maybe an example would work best here. It seems that maybe the terminology I was taught isn't as universal as I expected it to be, sorry that that has caused some confusion.


Let's say you are measuring subjects' reaction time under two different conditions. You'll end up with a table like this:

Item Subject Condition Reaction time
1 1 1 450
1 1 2 520
2 1 1 465
2 1 2 543
1 2 1 420
1 2 2 500
2 2 1 430
2 2 2 510
3 1 1 430
3 1 2 550
3 2 1 410
3 2 2 530


Aggregating over Items (according to the instructor's terminology) to execute VP analysis later on will yield the following table:

Subject Condition Mean reaction time
1 1 433.33
1 2 537.67
2 1 420
2 2 513.33


Where I calculated the mean reaction times for each subject under each condition, i.e. added up reaction times for subject 1 for items 1, 2, and 3 and divided by 3 and subject 2 for items 1, 2, and 3 and divided by 3, once for each condition.

Aggregating over subjects (according to her terminology) to execute Item analysis later on yields the following:

Item Condition Mean reaction time
1 1 435
1 2 510
2 1 447.5
2 2 526.5
3 1 420
3 2 540


Where I calculated the mean reaction times for each item under each condition, i.e. added up reaction times for item1 for subjects 1 and 2 and divided by 2, item2 for subjects 1 and 2 divided by 2, etc., once for each condition.


I gather that the purpose of this is to determine whether the conditions have a significant effect on the reaction time of subjects (if they are generally faster under condition 1 than condition 2, as in my example) but also if they influence the average response time to each item.

So mainly, the question remains a terminological one: why do we call the first one aggregating over items when we're taking the means for each subject, and vice versa?

EDIT: Seems that table formatting is changed when saving the comment. I think they should still be legible though, it's just the the column names now don't line up with the columns anymore.
 
Last edited:

obh

Well-Known Member
#6
Hi John,

Sorry, I'm not sure if there is an exact terminology.
You should probably stick to the formulas, not to the words that tries to describe formulas ...

Aggregating over subjects (according to her terminology) to execute Item analysis later on yields the following:

Item Condition Mean reaction time
1 1 435
1 2 510
I may think about "Aggregating over subjects" as average of subject 1 and average of subject 2 ...
subject average
1 493
2 466

You may call your example as average over the combination: (Item, condition)

It seems that you instructor write about the reduced dimension, aggraged over the reduced dimension "subject".
I don't think your instructor did a mistake, so you probably should stick to the terminology of the one that would check your test.