coding - one categorical vs multiple dummy

#1
Hi,
Most things I read seem to suggest that when you have a categorical predictor in a regression analysis you should create k-1 dummy variables, where k is the number of categories.

Typically when I do a regression in R, I simply have the one variable with each category/level numbered (i.e. 1,2,3,4 etc) and this seems to give me no problems.

Why is there ever any need or benefit to creating multiple dummy variables?

Thanks,
Paul
 

maartenbuis

TS Contributor
#2
If you enter your variable as a categorical variable you are in effect creating those dummies without realizing it. Doing it internally has the advantage that R knows that these dummies belong together, which can be helpful in post-estimation. The main reason why you see this in texts is that that is how you used to do this, and it is a good thing to realize and understand that that is what you are doing under the hood when entering a categorical variable. Moreover, there is some cool stuff you can do if you create your variables yourself.