Hi, I have done my search on this website and google and have found my statistical test to be either t-test or chi-square so I'd like you guys to tell me which one is more valid since I am getting conflicting information. I don't want to argue if this is ordinal or interval data as even the academics bicker over this and I just want to find the right statistical test that is valid and will tell me what I want to find.

Background: I am doing a group research project. We are assessing the preference of product A to Product B in a certain specific population. We set up 8 questions in Likert Scale from 1-5. It is important to note that these 8 questions is actually 4 questions but split up. What I mean is that, for example, Questions 1: "Do you think product A is effective at doing what it does" Questions 2: "Do you think product B is effective at doing what it does" and that will be one group. Then group 2 would be questions 3-4 asking "do you prefer to use product A as your top choice" and "do you prefer to use product B as your top choice." I think you guys get what I am trying to get at.

Thank you for your help, our group has been stuck at this for sometime now and I even contacted our school's biostatistician and he said we should do chi-square but what do you guys think? We also plan to give descriptive statistics as well.

i think it would also help us out to know what your research question is, i.e. what are you looking for. for instance, do you want to see which product is preferred? are you intersted in seeing whether the wording of the questions generate different answers? both? there are several ways to go here depending on what you're looking for...

for all your psychometric needs! https://psychometroscar.wordpress.com/about/

We are trying to see whether this subgroup population prefer product A or product B.

The questions are in the following group
1-2: you prefer product a or product b as your number one choice for doing this thing
3-4: I think product a work better than product b and verse visa (yeh this is a poorly worded question)
5-6: I think product a or product b works in doing this thing
7-8: I think product a or product b has no bad adverse events.

so as you can see, its basically trying to assess whether these people prefer product a or b based from these questions. We also collected demographic data such as education, sex, age, occupation. So I was think a chi square would work but I been reading that a t-test would also work?

So I was think a chi square would work but I been reading that a t-test would also work?

A t-test "works". But the results will be flawed. Since data on likert scale usually have smaller variance then "real" data, the p-values you get will be smaller. Thus you'd get a higher chance of getting a positive result, even though there aren't any. F-test on the other hand, is possible to use.

Gene Glass’ famous Monte Carlo study of ANOVA in which Glass showed that the F-test was incredibly robust to violations of the interval data assumption (as well as moderate skewing) and could be used to do statistical tests at the scale and subscale (4 to 8 items but preferably closer to 8) level of the data that was collected using a 5 to 7 point Likert response format with no resulting bias
[29]

Conclusion: A t-test on likert scale data will probably result in a higher chance of getting a type I error. An F-test though, is possible to use during some conditions.

Carifio, J., & Perla, J, R. (2007). Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. Journal of Social Sciences, 3, 106-116.

A t-test "works". But the results will be flawed. Since data on likert scale usually have smaller variance then "real" data, the p-values you get will be smaller. Thus you'd get a higher chance of getting a positive result, even though there aren't any. F-test on the other hand, is possible to use.Conclusion: A t-test on likert scale data will probably result in a higher chance of getting a type I error. An F-test though, is possible to use during some conditions.

Hi there,

A couple of things confuse me a bit here:

Can you explain a little more what you mean by "real data"?

How can an F-test be ok but t-test not? t and F are very closely related - if a variable X is distributed as t with df=m, then X^2 is distributed as F with df1 = 1 and df2 = m. So in an applied sense if there are just two groups and one DV, t-test and F-test will provide exactly the same p value...

i was gonna follow-up on this post but i just realized that the question postedby CBear to Englund is considerably more interesting, so i'll wait & see...

for all your psychometric needs! https://psychometroscar.wordpress.com/about/

Can you explain a little more what you mean by "real data"?

Simply what I meant: People do not answer correctly since they are affected of different kinds of cognitive bias et cetera. Further, the "real" or "true" values are unknown in most cases.

Originally Posted by CowboyBear

How can an F-test be ok but t-test not? t and F are very closely related - if a variable X is distributed as t with df=m, then X^2 is distributed as F with df1 = 1 and df2 = m. So in an applied sense if there are just two groups and one DV, t-test and F-test will provide exactly the same p value...

That is true. But if you look at what I cited in my last post you'll partly get your answer. See below.

F-test was incredibly robust to violations of the interval data assumption (as well as moderate skewing) and could be used to do statistical tests at the scale and subscale (4 to 8 items but preferably closer to 8) level of the data that was collected using a 5 to 7 point Likert

So, an F-test where there are only two groups and one DV, will also be flawed.

Why not use the Mann-Whitney U test and therefore only worry about ranking the data rather than the underlying distribution. It's possible to rank Likert data but not so easy to determine whether it is truly a continuous scale (which it probably isn't).

Personally I think its reasonable to assume that scales that involve Most satisified to Least satisfied are continuous, but there is broad disagreement on this issue.

"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

Personally I think its reasonable to assume that scales that involve Most satisified to Least satisfied are continuous, but there is broad disagreement on this issue.

That is most often wrong. In most cases, the difference between 4 and 5 (on a five level scale) is bigger than the difference between 2 and 3 or between 3 and 4. That is also why a simple t-test gives flawed results and why it has to be seen as ordinal data. Please correct me if I'm wrong.

But if you look at what I cited in my last post you'll partly get your answer. See below.So, an F-test where there are only two groups and one DV, will also be flawed.

I just don't quite see how you're drawing the conclusion that a t-test will "work" with Likert data but not an F-test... as far as I can tell, Carifio & Perla don't discuss t-tests in that article, just F-tests. (I haven't fully read the article though - the writing style is so grating that I had to look away).

The accuracy of their claim about the F-test is pretty dodgy to start with though. Carifio and Perla claim that: "Gene Glass’ famous Monte Carlo study of ANOVA... showed that the F-test was incredibly robust to violations of the interval data assumption."

This is nonsense. Glass et al's study was concerned with distributional assumptions like homoscedasticity and error normality. It doesn't even mention the issue of measurement levels (ordinal, interval, etc). The assumption of interval measurement concerns the meaning of data points and intervals on the particular scale, not whether the data is continuous or normally distributed.

Actually, it's sort of an interesting question whether it's possible or reasonable at all to use simulation studies to check the robustness of statistical tests to the use of ordinal data. I don't think we can really do this unless we make quite specific assumptions in turn (e.g. like assuming that an ordinal measurement scale is formed by dividing a latent normal distribution into a discrete set of response categories). Spunky?

Carifio & Perla's article is probably as boring as things get. there's just blah, blah, blah on why Likert-type scales do this or that. very little of it is worthwhile to remember... IMHO, the only useful thing here is the citation for the Glass et. al. simulation. now, with that being said, Glass' article is pretty limited when it comes to this issue of the impact of the scale of measurement on the properties of statistical tests. and why is that? well... because they only consider ONE-WAY ANOVA designs. it doesnt take a statistician to deduce that the more complex a technique gets, the more and more it'll rely on its assumptions, and in the ANOVA context this happens as soon as factorial designs take centre stage. scales of measurement can screw your ability to detect interactions, and dont even get me started with ranked data in repeated-measures ANOVA.

but aaaaaaaaaaaaaaaaaaaaanyway... the issue at hand is the t-test, right... well... here it goes.

BEHOLD PEOPLE! THE SIMULATION TO END ALL SIMULATIONS... CAN T-TESTS BE USED IN ORDINAL DATA?

Code:

##SIMULATION PARAMETERS
set.seed(1)
reps <- 1000 #numebr of repetitions for the simulation
rezu <- double(reps) #vector to store the results of such simulation
n <- 30 #sample size
### ACTUAL SIMULATION##
for (i in 1:reps) {
X1 <- rnorm(n)
X2 <- rnorm(n)
U1 <- as.numeric(cut(X1, breaks=c(-Inf, mean(X1)-sd(X1), mean(X1), mean(X1)+sd(X1), Inf), labels=c(1:4)))
U2 <- as.numeric(cut(X2, breaks=c(-Inf, mean(X2)-sd(X2), mean(X2), mean(X2)+sd(X2), Inf), labels=c(1:4)))
rezu[i] <- t.test(U1,U2)$p.value
}
summary(rezu)
sd(rezu)
plot(density(rezu))

these are the results i get:

Code:

> summary(rezu)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2784 0.6919 0.7985 0.8016 0.8950 1.0000
> sd(rezu)
[1] 0.1509686

conclusion? YES, YOU CAN USE ORDINAL DATA FOR A T-TEST

ok, ok... i hope people notice that i'm just being a little facetious when it comes to this issue. there are many things i'm assuming here: (equally-spaced intervals of discretization, normal distributions, etc...) but i think that code is simple and flexible enough for other people to expand on it. for the time being the fact that it preserves type 1 error is enough to believe that, at least in the case of 4 likert points, we can consider ordinal data as continuous. the more points you use, the better the estimate, of course.

so to the OP, you should be alright... i think. lol

for all your psychometric needs! https://psychometroscar.wordpress.com/about/