# Likert scale data -can I use paired t-test? What about normality testing?

#### Labestia

##### New Member
Hey all!

I have data from 40 subjects. I asked them to analyze eight short films (four sad ones, four neutral ones). I asked them to rate on a Likert scale 1 (not at all)-5 (very much) how much of different negative emotions the films caused in them. I illustrate:

Film 1:
Rate how much of each emotion the film made you feel:
Anger 1 2 3 4 5
Anxiety 1 2 3 4 5
Sadness 1 2 3 4 5
Fear 1 2 3 4 5
….and so on.

I created sum scores of all the negative emotions the subject felt during each film: Anger+sadness+anxiety+fear for Film 1, Anger+sadness+anxiety+fear for Film 2, Anger+sadness+anxiety+fear for Film 3 etc.

Then I combined all the emotion scores for each sad and neutral film into two single scores:

• (Anger+sadness+anxiety+fear for Neutral film 1)+(Anger+sadness+anxiety+fear for Neutral film 2)+(Anger+sadness+anxiety+fear for Neutral film 3) + (Anger+sadness+anxiety+fear for Neutral film 4) = “Total score of negative feelings for neutral films”

The point of doing that is to compare if the negative films overall cause more negative emotions than the neutral films (which I hope they do, because my next study depends on it ).

Questions:
1. Does this process and summing of items/variables make sense?
2. Can I compare the two total scores for neutral vs. sad films with a paired t-test? Every subject rated all the eight films. Or is there a better test for that?
3. How can I compare all the eight films separately, between each other (i.e. without comparing the two summed variables)?
4. I read somewhere that I need to test my data for normality, but then other sources say I don’t need to…? Do I indeed have to? And how? At what stage?

I am eternally grateful for your help!

#### CB

##### Super Moderator
1. Does this process and summing of items/variables make sense?
It seems like a simple and reasonably conventional way to measure negative emotions, yes.

2. Can I compare the two total scores for neutral vs. sad films with a paired t-test? Every subject rated all the eight films. Or is there a better test for that?
Broadly there are two potential problems with using a parametric normal-theory test with a Likert DV:

1) The error distribution won't be genuinely normal with Likert data -> the sampling distribution of the coefficients isn't guaranteed normal in small samples -> your Type 1 error rate and confidence interval coverage might be different than nominal (e.g., True Type 1 error rate greater than 5%, or 95% confidence interval coverage that isn't exactly 95%). But in reality the sampling distribution of the coefficients should be really close to normal with N=40, so this isn't a major worry.

2) Using an ordinal variable in a parametric test is not "permissible" according to the represenationalist theory of measurement, because the results will depend on the (theoretically arbitrary) choice of how you code the variables. See Stevens, 1946. However, you have already assumed your item scores are at least interval when you summed them together; that summing process isn't invariant across coding choices either.

3. How can I compare all the eight films separately, between each other (i.e. without comparing the two summed variables)?
There are a few ways to do this, but I'd ask "Why?" We know different films produce different emotions - what's the goal of this analysis?

4. I read somewhere that I need to test my data for normality, but then other sources say I don’t need to…? Do I indeed have to? And how? At what stage?
If sources disagree, so will we (personally I think there are almost always much more important things to worry about than normality, like power and preregistration and measurement error; others disagree). But if you do examine normality, look at how closely the differences between conditions approximate a normal distribution (e.g., via qqplots or histogram). Running a statistical test for normality on Likert data is pointless; Likert data cannot be truly normal, because Likert data is discrete not continuous. The question is about how bad the departure from normality is.

#### Labestia

##### New Member
Thank you so much, CowboyBear
You clarified many things I was wondering about. I will examine the qqplots/histograms as you suggest, but I do note your comment about other priorities as well.

Yes, I guessed that one can argue that once the Likert items have been summed (to total scores), a parametric test could be considered.Do you think the paired t-test is the best test to use?

About comparing the eight movies among each other; I actually wanted to ask this in case it would be needed to assess how the films differ from each other when it comes to emotional impact (i.e. is one of the films much more upsetting than the others?). That would be done with paired t-test as well? If I do many comparisons, do I need to do a Bonferroni correction?
I will be using the four upsetting movies in another study that I am planning to conduct, one where the participants will be exposed to upsetting material (=the films), so this current study is just to verify that the four upsetting movies do indeed cause more emotion than the neutral ones.

Thank you so much for your help & assistance -I am very motivated to learn more statistics, but sometimes it can be confusing with contradictory information from different sources... That is why I really appreciate that you took the time to clarify these questions for me.

#### noetsi

##### No cake for spunky
I spent a lot of time reading the literature on this because we have a lot of likert data. There are some strong disagreements among experts on this topic. Basically it comes down to if you feel its reasonable that the dimension you are measuring is ordered and if the distances between the levels are the same. If you can you can calculate a mean, if you can't then a t test makes limited sense. Normality is another problem, but if you have a large sample than probably not a major one.

But again various authors strongly disagree on the validity of this approach.

#### BCN

##### New Member
I need help from experts:

I am a research scholar. I have some clarifications regarding using t-test. My study is a comparative study where I've collected data from two different countries. In order to compare the data, I am using t-test. However, using this test multiple times increases type error. An alternative can be ANOVA. But my concern is how can I use ANOVA to compare the demographic variables (age groups -3 categories) of two different countries? For example to find out how the younger group in country1 is different from the younger group in country 2? Looking forward to your suggestions.

#### rogojel

##### TS Contributor
Hi,
this is possible wirh ANOVA. Take a look at post-hoc tests.
regards

#### CB

##### Super Moderator
Yes, I guessed that one can argue that once the Likert items have been summed (to total scores), a parametric test could be considered.
Yeah, kinda. The summing operation doesn't magically transform an ordinal variable into an interval one, but by doing the summing you have already assumed the items are at least interval-level, so quibbling over the ordinal/interval distinction after that point doesn't seem worthwhile.

Do you think the paired t-test is the best test to use?
Hmm, "best" is a tricky word, and depends on how many rabbit holes you wish to go down. A paired t-test is the obvious conventional choice in this scenario. There are alternatives: For example, you could avoid the problems with significance testing in general by using Bayesian estimation with an informative prior and a region of practical equivalence (see Kruschke, 2013). Or if you wanted to avoid the assumption that your items are interval (but were willing to assume underlying quantitative latent variables) you could use a latent variable/structural equation model approach. It depends on how much time you have to dedicate to this.

About comparing the eight movies among each other; I actually wanted to ask this in case it would be needed to assess how the films differ from each other when it comes to emotional impact (i.e. is one of the films much more upsetting than the others?). That would be done with paired t-test as well? If I do many comparisons, do I need to do a Bonferroni correction?
I would suggest just plotting the mean negative and positive emotion ratings for each film, possibly accompanied by standard error bars (or maybe use something like violin plots to depict both central tendency and variation). I don't think it's useful to perform an significance test here - the null hypothesis (that the films all cause exactly the same levels of positive and negative emotion) is obviously implausible, so testing it formally is probably not worthwhile.

Thank you so much for your help & assistance -I am very motivated to learn more statistics, but sometimes it can be confusing with contradictory information from different sources... That is why I really appreciate that you took the time to clarify these questions for me.
No worries