# Collinearity Among Categorical Variables in Regression

#### jastingo

##### New Member
Does anyone have any knowledge and/or experience in how to measure collinearity between categorical variables that have more than 2 categories in the context of regression? How would one go about measuring collinearity? Would it be something akin to looking at levels of association via contingency table analysis (Fisher's Test, Pearson's Chi-Square test, etc.)? Or perhaps you could examine correlations of dummy variable codings?

Any information would be very helpful!

#### trinker

##### ggplot2orBust
Variance inflation factor is the method I use.

I'm a researcher not a statistician (caveat) but I use:

vif(fit) # variance inflation factors
Code:
sqrt(vif(fit)) > 2 # problem?
in R as recommended by Rob Kabacoff

#### jastingo

##### New Member
Thanks trinker! What about in generalized linear models? I'm trying to evaluate the validity of a logistic model that includes categorical as well as continuous predictor variables.

#### noetsi

##### Fortran must die
The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this).

It is widely held that converting categorical variables into a series of dummy variables avoids this problem. Since taking SEM I am less sure this is the case. But regardless you would have to create dummies not use the categorical variable itself.

#### trinker

##### ggplot2orBust
noetsi said:
The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this)
A quick look up in my trusty R In Action (by Robert Kabacoff author of quick-R) gives us an R answer for polychoric correlations.
Robert I. Kabacoff said:
OTHER TYPES OF CORRELATIONS
The hetcor() function in the polycor package can compute a heterogeneous correlation
matrix containing Pearson product-moment correlations between numeric
variables, polyserial correlations between numeric and ordinal variables, polychoric
correlations between ordinal variables, and tetrachoric correlations between two dichotomous
variables. Polyserial, polychoric, and tetrachoric correlations assume that
the ordinal or dichotomous variables are derived from underlying normal distributions.

#### noetsi

##### Fortran must die
One of thousands of things I have to learn in R (that is how to code it to do it).