How to test for Multi-collinearity among dummy explanatory variables

#1
Hello,

Almost all the explanatory variables in my data are dummy variables and I would like to test for multi-collinearity among them. Does it make sense to still use the traditional Pearson correlation coefficients or the Variance Inflation Factor tests? If not, can you suggest what other methods I can use?

Thank you,
Kelly
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
I know people do use VIF for this situation. Not sure if there are assumptions, such as equal distances between groups.
 
#4
Thanks HLSmith! Do you know why VIF would be still appropriate for dummy variables, while Pearson is not? Is there a publication reference that you could point me to which discusses this? I have done an extensive search online and found nothing for this topic, which is quite surprising since this is not an uncommon problem I am sure. You're the first person who has specifically said this, so thanks for your reply and hope to hear from you soon!!!
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
I did not say Pearson would be inappropriate, not sure how appropriate it may be though since VIF is designed to look at this topic.

You can search this topic on this forum and find other comparable posts to your own. I recall others addressing this topic. If you find relevant material, I would recommend linking it on this thread in order to close the loop. The prior posts addressed using VIF in linear regression for variables that were actually being examined for use in a logistic regression model (so testing collinearity for IVs in logistic regression using VIF in a linear model).

I remember finding references to using VIF and Tolerance statistics in the past but have no direct references for you.
 
#6
Thanks again HLSmith. Sorry, did not mean to imply that you said Pearson is inappropriate. I did find another thread discussing VIF that was useful but am not sure how to attach it to this post so I am just going to include the title here so that anyone can do a search on it on this forum. It is called "Logistic regression detection of multicollinearity- is VIF applicable?"

That post describes clearly what the VIF is. So is my understanding correct that since the VIF is based on the concept of variance, which is in turn based on the concept of R2, ie, the variation, it does not matter if the independent variables are dummy (dichotomous) variables? But how is that different from the Pearson correlation coefficient then which is also based on the concept of variances? Why is the VIF more appropriate?

By the way, I looked at all the posts in the forum about VIF's and could not find any external reference that said why VIF is appropriate for dummy variables. Thank you very much!!!
 

noetsi

Fortran must die
#7
I have never read anything that suggest dummy variables are any different than any other variable in terms of the use of VIF or tolerance.


Pearson would have several problems with multicolinearity (MC). First, it always involves a bivariate relationship while MC may involve multivariate relationships. Thus you may not capture multicolinearity with pearson. Secondly, pearson assumes an interval variable which is violated with a dummy variable. Ignoring the issue of multivariate relationships, you would probably have to use a polychoric correlations not pearson to analyze dummy variables.