+ Reply to Thread
Results 1 to 3 of 3

Thread: Must categorical variables always be dummy coded in linear regression?

  1. #1
    Points: 2,807, Level: 32
    Level completed: 38%, Points required for next Level: 93

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Must categorical variables always be dummy coded in linear regression?




    My question is must we always dummy code categorical variables in linear regression. What if I wanted to understand the overall effect of a categorical variable couldn't I just include in it my regression without dummy coding it? What if I had ordered categorical data could I include it as one variable without dummy coding and get an overall sense of the estimated relationship?

    Alternatively, what if I wanted to control for the variable but not necessarily know the relationship among the various indicators? For instance, let's say I'm running a regression on house sales and I wanted to control for house color. I don't really care which color sells at a higher value but I would like to know overall if house color is significant. Couldn't I just include the categorical variable without dummy coding?
    Thanks

  2. #2
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Must categorical variables always be dummy coded in linear regression?

    How could you include it without dummy coding it? Have you ever worked through regression by hand? With matrix algebra? If not this is a valuable exercise. You'll see it needs numbers. Dummy coding is merely a way of tricking the regression to accept categorical (well maybe it's more sophisticated than tricking).
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. #3
    TS Contributor
    Points: 17,779, Level: 84
    Level completed: 86%, Points required for next Level: 71
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,542
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: Must categorical variables always be dummy coded in linear regression?


    What if I had ordered categorical data
    ok, that means something like "very small - small - bigger - very big".
    In order to handle this more easily, people might use numbers, e.g.
    very small = 1, small = 12, bigger = 1000, very big = 100000000.
    Or anything else (e.g. 1 2 3 4), as long as there's a correct ranking.
    Of course there might be people who confuse indicators for the
    ranking of variables with numbers which can be treated like interval
    scales (so, in a regression they would use 1, 12, 1000 and 10000000
    as if it were real numbers), but don't do that yourself.

    With kind regards

    K.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats