# Cronbach Alpha for 3-item scale

#### Danica Blanche

##### New Member
I've read that cronbach alphas are also dependent on the number of items, and for a scale meant only for descriptive purposes, what is the acceptable alpha for a 3-item scale? Would, say, .45 suffice?

#### trinker

##### ggplot2orBust
You're confusing the word items and response levels.

and for a scale meant only for descriptive purposes
What do you mean by this??

acceptable alpha for a 3-item scale? Would, say, .45 suffice?
Acceptable alpha depends on the use/purpose/goals of the tool. The 3 item response will not effect the acceptable level of alpha.

#### spunky

##### Can't make spagetti
The 3 item response will not effect the acceptable level of alpha.
care to elaborate a little bit more on this?

#### trinker

##### ggplot2orBust
My point here was that the decision on what's acceptable depends on what you're going to use the test for; research, admission to a program, etc. In a way the acceptable coeffecient level is a bit like alpha level in a hypothesis test. We don't say oh this is a poorer instrument so I'm willing to accept a lower coefficient alpha.

Care to expand, you're certainly much more knowledgeable about psychometrics than I.

#### Danica Blanche

##### New Member
What I meant is that there are actually just three items (like three statements), and the responses are based on a 5-point scale. Alpha is heavily dependent on the number of items composing the scale, so it may be possible to actually obtain a high alpha if the items are numerous enough, despite interitem correlations being low. So, considering that there are only three items, I was wondering if an alpha level deemed to be unacceptable (<.5) could be acceptable in this case, and if it is, how to determine the acceptable alpha level taking into account the number of items. It is no major scale, just one minor part of a study examining one's propensity for a certain construct. Or perhaps there is no need to test for reliability and just get the average as an aggregate likert scale?

#### trinker

##### ggplot2orBust
I was wondering if an alpha level deemed to be unacceptable (<.5) could be acceptable in this case, and if it is, how to determine the acceptable alpha level taking into account the number of items.
I would say no. Add items (probably not what you wanted to hear). Jake, Spunky and Lazar (three contributors here) are more experienced with psychometrics than I. Maybe they'll weigh in here as well.

#### nbjo

##### New Member
The rule of thumb is that you want to have alpha of AT LEAST .70. Ideally you want something at about .90.

Alpha is related to the number of item, in that as you increase the number of items, alpha increases.

Alpha of .45 means that only 45% of scores are explained by variation in variable you are measuring (i.e., true score variance), 55% is error. The problem with low reliability is that it damages the validity of the measurement and attenuates relationship you are studying (makes them closer to 0).

#### spunky

##### Can't make spagetti
The rule of thumb is that you want to have alpha of AT LEAST .70. Ideally you want something at about .90.

Alpha is related to the number of item, in that as you increase the number of items, alpha increases.
yes, we know this already. the problem is that the OP has only 3 items and nothing more. for my own sanity check, i ran a small-scale simulation where the covariance structure under which alpha holds true was true in the population (so a tau-equivalent model) with factor loadings as high as 0.99 and a sample size of 100. after 10,000 replications the highest alpha i was able to find was 0.51.

so... here's the problem. although i think you raise a valid point on inquiring about the reliability of small scales, i don't think it's very meaningful to talk about alpha in cases like this. 3 items are just barely enough to tap on your construct of interest (from a Structural Equation Modelling point of view) so i'm siding with trinker on this one and you should be cautious.

HOWEVER... now you've left me wondering. factor loadings of 0.99 with a decent sample size of 100 and the most alpha gets to is 0.5?? maybe you are right. maybe studying further the reliability of small scales is worthwhile.

i'll add it on my to-do list, heh.

#### CB

##### Super Moderator
Two issues with alpha:

1) Do we really care about "true scores"? Despite the name, a person's true score is not their actual level of the attribute of interest. It's just the average score they'd get if we could hypothetically repeat the test a very large number of times, with each administration independent of the other administrations, and their level of the attribute remaining unchanging. Whatever sources of invalidity are present in the test are present in the true score too. So again, is it something we should really care about?

2) Even if we did care about true scores and true score variance, alpha is not a very good estimate of the proportion of true score variance anyway, since it (usually unrealistically) assumes the measure is essentially tau equivalent, with no correlated measurement error across items.

Useful article: On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha

People tend to always report and worry about alpha because A) it doesn't require you to obtain any other information beyond responses to the test; and B) It's available in SPSS. But there are plenty of other ways to assess the reliability and validity of a test. So maybe the solution for the OP could be to consider the psychometric quality of the test from other perspectives. E.g., can you show evidence for content validity? Convergent and discriminant validity? What have other studies found out about this test? Etc.

The problem with low reliability is that it damages the validity of the measurement and attenuates relationship you are studying (makes them closer to 0).
Interestingly, measurement error does not necessarily result in attenuated correlations between variables. The opposite can occur, if correlated measurement error across variables is present. There is a simple simulation showing that in this article by TalkStatters.

#### spunky

##### Can't make spagetti
But there are plenty of other ways to assess the reliability and validity of a test.
i'm going to leave validity on the side for a moment (mostly because i don't like it, lol)... but coming back to reliability i do believe the OP posts a very valid question that hasn't been explored much in Psychometrics. would Guttman's lambda fare better in the presence of a small number of items? sure, it would be bigger than alpha but for how much? all reliability indices come from the covariance matrix of the items, after all, so i'm willing to throw in the hypothesis that they would all be somewhat crappy.

i do get why from a theoretical standpoint one would not talk much about reliability when there's such a limited number of items on a scale or subscale. reliability indices are, after all, trying to get to this idea of consistency over repeated measurements and if you only have 3 measurement instances then how consistent could your answers be? nevertheless, there are times where all you have are... well... 3 items and that's it, lol.

this seems like a common-enough problem that someone would've done some research on it but i can't seem to find anything

The opposite can occur, if correlated measurement error across variables is present. There is a simple simulation showing that in this article by TalkStatters.
i think in our area we sort of live and die by the assumption of uncorrelated, random error, with an expected value of 0 and normally-distributed, right? i'm guessing that's what nbjo is referring to.

ALTHOUGH i do give you brownie points for your shameless self-promotion. i like that

#### CB

##### Super Moderator
i do get why from a theoretical standpoint one would not talk much about reliability when there's such a limited number of items on a scale or subscale. reliability indices are, after all, trying to get to this idea of consistency over repeated measurements and if you only have 3 measurement instances then how consistent could your answers be?
I think that's exactly it, really. Low reliability estimates for short tests probably aren't so much a reflection of a problem with the reliability estimate we use, so much as something that's pretty much built into how we define reliability.

nevertheless, there are times where all you have are... well... 3 items and that's it, lol.
Yeah, or one item even Personally I'd kinda edge toward saying that validity is what really matters, so what is the evidence for validity like? But who knows. Maybe we need a better definition of reliability that would be kinder to short scales.

ALTHOUGH i do give you brownie points for your shameless self-promotion. i like that
Publish, publicise or perish

#### Dason

##### Ambassador to the humans
Publish, publicise or perish
And really nobody can blame you for publicizing such a brilliant article.

#### spunky

##### Can't make spagetti
I think that's exactly it, really. Low reliability estimates for short tests probably aren't so much a reflection of a problem with the reliability estimate we use, so much as something that's pretty much built into how we define reliability.
that's why i said they probably tell us something. like if the parallel or tau-equivalent assumptions hold and you get an alpha of say around 0.4-0.5 those would be EXTREMELY good numbers. the implication that would follow is that if you were to further sample from that hypothetical item universe, you could probably get some decent reliability indices.

Personally I'd kinda edge toward saying that validity is what really matters, so what is the evidence for validity like?
true that... but there's this ugly little problem of what exactly is validity, right? like before it was all about predictive ability. then that just became part of the evidence towards validity and now tests were the ones that are valid/invalid. then we moved away from that and it's the inferences from the test scores what are valid or not valid.

and i think now we're, once again, at the point where there are as many definitions of validity as authors out there, and they do not necessarily encompass each other. how are we going to "edge" towards validity if we don't even know what it is? :/

#### spunky

##### Can't make spagetti
And really nobody can blame you for publicizing such a brilliant article.
well, i don't like it because i could've jumped into that bandwagon and i didn't. so now i'll downvote it out of spite and jealousy.

#### CB

##### Super Moderator
how are we going to "edge" towards validity if we don't even know what it is? :/
We know because Denny said.

(But to be fair even if we magically all agree on what validity is, how do we know whether we have it grasped in our sticky little fingers?)