Using ANOVA for comparing perceived emotions in reaction to an embodied agent

#1
Hi there,

I stumbled upon an interesting paper on "The Influence of Emotions in Embodied Agents on Human Decision-Making" by de Melo, Carnevale & Gratch.

Basically they ran and experiment where they used different versions of an embodied agent (a cooperative vs. an individualistic agent) in a prisoner's dilemma experiment.
Groups of human players were formed and had to play against either the cooperative or the individualistic agent. The agents, although differing in the shown animations, were playing exactly the same strategy in both groups. In the end, the authors checked whether the human players were willing to cooperate more often with the cooperative agent (-> yes).

If you're interested in details you can take a look in the mentioned paper here: http://www.csc.ncsu.edu/faculty/robertsd/gamesreading/papers-s11/3-22.demelo.10.pdf

Now I'm getting to the core of my questions (see page 6 in the pdf-file). In order to assemble the individualistic and the cooperative agent, the authors performed a pre study.
They showed different animations the embodied agent could do to 21 students and had them classify those animations from 1 (not at all) to 5 (very much) on 5 scales: joy, sadness, shame, anger.

For example: an animation expressing joy (at least the authors thought of it as joy) was shown to students. As you would expect, most of them gave high scores on the joy scale, and low scores on all other scales.
The authors then used a reapeated-measures ANOVA in order to compare the means for those perceived emotions. They concluded that there were significant differences between those means.

What is bothering me: it would seem to me that the data used for the ANOVA cannot be normally distributed. The use of a Likert-like scale (without a proper zero if you will) and the design of the experiment should make that highly unlikely.
How do you feel about this? Could this be a big weakness of the study?

Furthermore and in general: isn't this a rather unusual way of using ANOVA anyhow? I've never seen that the compared means are not experimental groups or so but ratings on several scales?

As I am working on a related experiment, I tested an almost similar pre study with 11 participants.
The participants have to take a look at an animation by an embodied agent and have to classify the perceived emotion on seven 1 (not at all) to 5 (very much) scales (neutral, surprise, anger, sadness, joy, digust and fear).
Of course there are animations where almost everyone agrees on the same combinations (for example: not at all neutral (1), joy (5), all remaining emotions (1).
As expected, I didn't get normally distributed data. This led me to the thought to use Kruskal-Wallis ANOVA to compare the means and check for significant differences between perceived emotions.
However: as you can probably see, the data naturally tends to be highly skewed and therefore opposes the Kruskal-Wallis assumption that your data doesn't need to be normally distributed but at least have equivalent distributions. Am I correct that the Kruskal-Wallis test should therefore not be used? Are there alternatives for showing significant differences between the means of perceived emotions?

For the planned experiment it would be useful to identify animations that are highly undisputed and don't convery more than one of the six basis emotions. I'd appreciate any feedback on my thoughts and also on the report by de Melo, Carnevale and Gratch.

Thanks in advance!
 

Karabiner

TS Contributor
#2
What is bothering me: it would seem to me that the data used for the ANOVA cannot be normally distributed. The use of a Likert-like scale (without a proper zero if you will) and the design of the experiment should make that highly unlikely.
How do you feel about this? Could this be a big weakness of the study?
I am not too familiar with that kind of manipulation check, but it seems
to me that performing tests of significance are not the main thing. Even
if the Null hypothesis "face X's expression is rated exactely the same on
all scales" is rejected, the differences between scales míght be too small
as to be of practical use (on the other hand, samples for manipulazion
checks are usually small, so the differences in the sample cannot be tiny if
they become significant).

Anyway, while the use of median as descriptive statistic & Friedman test
instead of repeated-measures ANOVA would seem more appropriate to me
because of the scaling of the items, the mean differences here seem
reasonably large, so personally I wouldn't consider it a big weakness.

Furthermore and in general: isn't this a rather unusual way of using ANOVA anyhow? I've never seen that the compared means are not experimental groups or so but ratings on several scales?
Repated mesaures ANOVAs are used for repeated mesaures.
This led me to the thought to use Kruskal-Wallis ANOVA to compare the means and check for significant differences between perceived emotions.
For repeated measures data you have to use the Friedman test.
Kruskal-Wallis is for the comparison of groups.

Kind regards

K.