# Thread: Collinearity Among Categorical Variables in Regression

1. ## Collinearity Among Categorical Variables in Regression

Does anyone have any knowledge and/or experience in how to measure collinearity between categorical variables that have more than 2 categories in the context of regression? How would one go about measuring collinearity? Would it be something akin to looking at levels of association via contingency table analysis (Fisher's Test, Pearson's Chi-Square test, etc.)? Or perhaps you could examine correlations of dummy variable codings?

Any information would be very helpful!

2. ## Re: Collinearity Among Categorical Variables in Regression

Variance inflation factor is the method I use.

I'm a researcher not a statistician (caveat) but I use:

vif(fit) # variance inflation factors
Code:
sqrt(vif(fit)) > 2 # problem?
in R as recommended by Rob Kabacoff

3. ## Re: Collinearity Among Categorical Variables in Regression

Thanks trinker! What about in generalized linear models? I'm trying to evaluate the validity of a logistic model that includes categorical as well as continuous predictor variables.

5. ## Re: Collinearity Among Categorical Variables in Regression

The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this).

It is widely held that converting categorical variables into a series of dummy variables avoids this problem. Since taking SEM I am less sure this is the case. But regardless you would have to create dummies not use the categorical variable itself.

6. ## Re: Collinearity Among Categorical Variables in Regression

Originally Posted by noetsi
The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this)
A quick look up in my trusty R In Action (by Robert Kabacoff author of quick-R) gives us an R answer for polychoric correlations.
Originally Posted by Robert I. Kabacoff
OTHER TYPES OF CORRELATIONS
The hetcor() function in the polycor package can compute a heterogeneous correlation
matrix containing Pearson product-moment correlations between numeric
variables, polyserial correlations between numeric and ordinal variables, polychoric
correlations between ordinal variables, and tetrachoric correlations between two dichotomous
variables. Polyserial, polychoric, and tetrachoric correlations assume that
the ordinal or dichotomous variables are derived from underlying normal distributions.

7. ## Re: Collinearity Among Categorical Variables in Regression

One of thousands of things I have to learn in R (that is how to code it to do it).

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts