+ Reply to Thread
Results 1 to 13 of 13

Thread: Dummy variables with categorical data with multiple levels.

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Dummy variables with categorical data with multiple levels.




    This always seemed easy to me, largely because I always had an obvious reference group before or I was dealing with variables that had only 2 levels. In the case of a variable I am working with now, a disability variable with six groups of disabilities, there is no obvious reference group at all. Moreover, I want the mean difference between the level coded one and zero for each dummy not the mean difference between that which is coded 1 and the reference category.

    For example for mental health I want to know what the mean difference is between those who have a mental health issue and those who don't, not the difference between those who have mental health and say substance abuse (if that was the reference category).

    I know I can do effect coding which compares each to the mean of the means of the variables ( I believe) but I am not really that familiar with that approach.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Dummy variables with categorical data with multiple levels.

    You know you can get the differences of interest regardless of how the categorical variables are coded?
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    Quote Originally Posted by Dason View Post
    You know you can get the differences of interest regardless of how the categorical variables are coded?
    No, how do you do that?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    Say I have a 7 level variable. Each level is if you are in a specific disability. I want to know how being in one disability differs from not being in that disability. Not how being in that disability varies from the omitted reference level (which is what dummy variables usually show in such cases).

    How do I do this, what type of coding does this?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #5
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dummy variables with categorical data with multiple levels.

    I would imagine a contrast statement could be used for mean of group #1 vs mean of group 2-7.
    Stop cowardice, ban guns!

  6. #6
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dummy variables with categorical data with multiple levels.

    Look at last example on this page:


    https://support.sas.com/documentatio...lm_sect012.htm


    You would have to write 7 of them and correct alpha.
    Stop cowardice, ban guns!

  7. The Following User Says Thank You to hlsmith For This Useful Post:

    noetsi (03-08-2016)

  8. #7
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    I was hoping there was a way to do it without contrast, but I can do that.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  9. #8
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dummy variables with categorical data with multiple levels.

    I can't think of another way, but that doesn't mean it ceases to exist. I always like to run one contrast ahead of time for something I know the result too, so that I can make sure I have the syntax right.
    Stop cowardice, ban guns!

  10. #9
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    You can run, as Jake pointed out to me many years ago, reference coding which essentially compares each level to the grand mean of the levels [I think the unweighted grand mean]. But whether this works with 7 levels in practice is not something I am confident of.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  11. #10
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dummy variables with categorical data with multiple levels.

    Yeah, contrast statements are not always intuitive to me, and then you start talking about orthogonality and I get even sketchier.
    Stop cowardice, ban guns!

  12. #11
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    Ok here is another issue. Customers have one of three severity levels 1,2,3. 3 is very rare we might have a hundred or so in 10,000. From this three level categorical variable I coded 2 dummies, for severity 1 and 2 which I am interested in. My concern, I have not been able to determine if this is a serious issue or not, is that the reference level [which composes severity 3] has so few cases.

    Does it interfere with either the SE or the slope estimate if the omitted [reference] category has a very small number of cases? I want to see the impact of 1 and 2 directly that is not omit them from the model.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  13. #12
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.

    I have a 3 level variable the third level only has about 100 cases (out of about 11,000 in the entire variable). I made that 3rd level the omitted reference level. I chose to code it this way because I was primarily interested in the two levels that have nearly all the cases and this is the simplest way to look at them.

    I have never seen anything that says making a level which has few cases a reference level violates anything, but somehow this seems questionable.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  14. #13
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Dummy variables with categorical data with multiple levels.


    I want to modify a previous question. I have a categorical variable with three levels [the third level has only about 100 cases out of about 11000 and is the reference level]. When I analyze the slope of the first two levels is this the mean difference between being at the level of that dummy and all other levels [that is both level 2 and 3 for level 1 or 1 and 3 for level 2] or is it the mean difference between the level of a dummy and the reference level only.

    Amazing after all these years of working with dummies I still don't know that
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats