correlations with likert data

noetsi

Fortran must die
#1
I am doing correlation analysis. I have a series of 4 point likert data (high dissatisfied, dissatisfied, satisfied, high satisfied). I want to run correlation analysis. The question is which is better spearman's rho or polychoric correlations.

I have thought after looking at the literature it was polychoric, but I wanted other opinions from wiser people than I :)
 

noetsi

Fortran must die
#3
Polychoric correlations. ALWAYS.

Spearman's correlation --->ranks----> what are you going to do about the multiple ties?
Use the value that SAS prints off :) Seriously I do not know how SAS addresses ties. In the past there was not huge differences in polychoric and spearman results.

Here is a related question. I am running regression with these values. While some argue that a dv can be treated as interval if its likert scale I chose to treat it as ordinal (which of course it is). I chose logistic regression rather than ordered logistic regression (I collapse the 4 point scale in the DV into 2 points). I ran into problems with ordered regression last time and in honesty I know it less than two level logistic regression.

Last time I ran this I converted the four point predictor variables into bivariate predictors (I don't know why, it was six years ago). Formally it does not matter I believe how the predictors are coded, two or four levels. But are there significant advantages of using either two level or 4 level predictors? I know of none in the literature.

The purpose of this is to determine which variables have relatively greater impact on the DV (and yes I know many feel that is an invalid exercise in regression). Not the slope coefficients. I use two different measures to do this, the standardized coefficients SAS creates and which has the higher WALD score (and yes I know there is no agreement on that either). :p
 

spunky

Doesn't actually exist
#4
Last time I ran this I converted the four point predictor variables into bivariate predictors (I don't know why, it was six years ago). Formally it does not matter I believe how the predictors are coded, two or four levels. But are there significant advantages of using either two level or 4 level predictors? I know of none in the literature.
Advantage for what purpose? For better or worse, most of this type of questions need to be treated on a case by case basis.

Remember that in a regression analysis, categorical predictors are treated as indicating group membership. So the coefficients tell you things about average changes with respect to some reference (usually another group). If you collapse categories then I'm guessing you're assuming 2 of the groups do not provide enough information to make meaningful comparisons and so you've got some sort of objective reason to combine them with the other two groups. Like, for example, we once had a situation where we had a 3-group predictor (basically something like 1= native born, 2=immigrant non-refugee, 3=immigrant refugee). We didn't have enough refugees in the data so we had to collapse it with the other immigrants to make a generic "immigrant" category. So the type of inferences that we wanted to make did change but were still somewhat conceptually related with the purpose of the study (i.e., investigating whether immigrant status was associated with some health outcomes).

So it's really impossible to talk in general about these things without taking into consideration the purpose and design of the study.
 

noetsi

Fortran must die
#5
No I just through having 4 rather than two levels might have some advantage I was not aware of (or vice versa). My categories are highly satisfied, satisfied, dissatisfied and highly dissatisfied. Logically there are differences between these so there is no reason to collapse them substantively. I have heard suggestions, from a former professor, that using likert scale data as a predictor can lead to nonsensical results but I have never seen anything in the literature to that effect.

My guess is I used 2 levels before because I was certain dummy variables were legitimate and was unsure about using likert data as a predictor.

The fact that I am using the regression for relative impact not to generate slopes of course makes this even more different. :) the design of the study is to find out which satisfaction variables like say pay had the greatest impact on overall satisfaction.
 

Miner

TS Contributor
#6
I have heard suggestions, from a former professor, that using likert scale data as a predictor can lead to nonsensical results but I have never seen anything in the literature to that effect.
This may be related to the expectation that there is no measurement error variation in the independent variables in regression. Ordinal data obviously does have a great deal of uncertainty. However, there are regression methods that can deal with errors in variables, so it is not an insurmountable problem.