+ Reply to Thread
Results 1 to 7 of 7

Thread: Collinearity Among Categorical Variables in Regression

  1. #1
    Points: 438, Level: 8
    Level completed: 76%, Points required for next Level: 12

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Collinearity Among Categorical Variables in Regression




    Does anyone have any knowledge and/or experience in how to measure collinearity between categorical variables that have more than 2 categories in the context of regression? How would one go about measuring collinearity? Would it be something akin to looking at levels of association via contingency table analysis (Fisher's Test, Pearson's Chi-Square test, etc.)? Or perhaps you could examine correlations of dummy variable codings?

    Any information would be very helpful!

  2. #2
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Collinearity Among Categorical Variables in Regression

    Variance inflation factor is the method I use.

    I'm a researcher not a statistician (caveat) but I use:

    vif(fit) # variance inflation factors
    Code: 
    sqrt(vif(fit)) > 2 # problem?
    in R as recommended by Rob Kabacoff
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. #3
    Points: 438, Level: 8
    Level completed: 76%, Points required for next Level: 12

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Collinearity Among Categorical Variables in Regression

    Thanks trinker! What about in generalized linear models? I'm trying to evaluate the validity of a logistic model that includes categorical as well as continuous predictor variables.

  4. #4
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Collinearity Among Categorical Variables in Regression

    I'm going to point you to a previous thread we had on this topic: http://www.talkstats.com/showthread....tic-Regression
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  5. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Collinearity Among Categorical Variables in Regression

    The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this).

    It is widely held that converting categorical variables into a series of dummy variables avoids this problem. Since taking SEM I am less sure this is the case. But regardless you would have to create dummies not use the categorical variable itself.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  6. #6
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Collinearity Among Categorical Variables in Regression

    Quote Originally Posted by noetsi
    The problem with using correlations for categorical variables is that most software use pearson product moment and that is invalid for categorical data. You should look at polychoric correlations instead. Unfortunately that takes special software like Mplus (I dont know if R does this)
    A quick look up in my trusty R In Action (by Robert Kabacoff author of quick-R) gives us an R answer for polychoric correlations.
    Quote Originally Posted by Robert I. Kabacoff
    OTHER TYPES OF CORRELATIONS
    The hetcor() function in the polycor package can compute a heterogeneous correlation
    matrix containing Pearson product-moment correlations between numeric
    variables, polyserial correlations between numeric and ordinal variables, polychoric
    correlations between ordinal variables, and tetrachoric correlations between two dichotomous
    variables. Polyserial, polychoric, and tetrachoric correlations assume that
    the ordinal or dichotomous variables are derived from underlying normal distributions.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  7. #7
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Collinearity Among Categorical Variables in Regression


    One of thousands of things I have to learn in R (that is how to code it to do it).
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats