Collinearity Among Categorical Variables in Regression

#1
Does anyone have any knowledge and/or experience in how to measure collinearity between categorical variables that have more than 2 categories in the context of regression? How would one go about measuring collinearity? Would it be something akin to looking at levels of association via contingency table analysis (Fisher's Test, Pearson's Chi-Square test, etc.)? Or perhaps you could examine correlations of dummy variable codings?

Any information would be very helpful!
 
#3
Thanks trinker! What about in generalized linear models? I'm trying to evaluate the validity of a logistic model that includes categorical as well as continuous predictor variables.
 

noetsi

Fortran must die
#5
The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this).

It is widely held that converting categorical variables into a series of dummy variables avoids this problem. Since taking SEM I am less sure this is the case. But regardless you would have to create dummies not use the categorical variable itself.
 

trinker

ggplot2orBust
#6
noetsi said:
The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this)
A quick look up in my trusty R In Action (by Robert Kabacoff author of quick-R) gives us an R answer for polychoric correlations.
Robert I. Kabacoff said:
OTHER TYPES OF CORRELATIONS
The hetcor() function in the polycor package can compute a heterogeneous correlation
matrix containing Pearson product-moment correlations between numeric
variables, polyserial correlations between numeric and ordinal variables, polychoric
correlations between ordinal variables, and tetrachoric correlations between two dichotomous
variables. Polyserial, polychoric, and tetrachoric correlations assume that
the ordinal or dichotomous variables are derived from underlying normal distributions.