More than two values for a dummy variable (regression)?
How do I deal with the case where I am dealing with more than two values for a dummy variable when doing regression? I know that if there are 2 values for a dummy variable e.g. yes and no then yes is 1 and no is 0. But, how do I deal with more than 2 values for the dummy e.g. what brand of laptop someone uses : Acer, Toshiba, Apple, Dell, HP, Others. Do I put Acer as 1, Toshiba as 2, Apple as 3, Dell as 4, HP as 5 and Others as 6?
Re: More than two values for a dummy variable (regression)?
You create n-1 dummy variables where n is the number of levels of the categorical variable. So for your example, you'll have 5 dummy variables. Depending on the interpretation you can use different coding schemes. Here is a very good discussion on them: http://www.ats.ucla.edu/stat/sas/web...r5/sasreg5.htm
Another way is to just keep single variable & use proportions for categories or probit function generated inverse of proportions.
Re: More than two values for a dummy variable (regression)?
Ok. Now if you want to denote that a computer is an Apple what will your three variables look like? (0, 0, 0). If you want to denote that a computer is an HP what will your three variables look like? (0, 0, 0).
Re: More than two values for a dummy variable (regression)?
Haha. Don't worry. Dummy variables definitely take some getting used to. And note that there using reference coding isn't the only way to create the dummy variables.
Re: More than two values for a dummy variable (regression)?
Just to add 1 more point. You've used equation: x = A a + B b + C c
Your equation doesn't contain an intercept. When the intercept is missing then you need n dummy variables & not n-1. Intercept acts as a reference category & denotes the excluded category but when you omit intercept then you must include all the categories as dummies.