1. ## Re: Dummy Variables

Going back to the basics:

Scenario, I have a categorical numeric variable (IV) I use in a simple binary regression model. The program turns the IV variable into a dummy variable (using SAS, it shows this in the output). I run the model with and without the intercept, but it still reports only two categories for the IV variable for the effects, since it needs a comparison group. The conversion of the categorical numeric variable into 0s, 1s is the dummy coding that the program does for me, correct? How does this fit into this thread?

Can someone also remind me when you want and don't want the intercept, since it definitely changes the results (I know it is situation-based, but please refresh my memory on how it fits into this scenario). Thanks you.

2. ## Re: Dummy Variables

How many levels of the categorical level variable are there? You will only get a single dummy variable if the original categorical level variable has only two categories. Otherwise you will get more than one dummies. This thread answered the theoretical question we thought you were asking. How many levels can a categorical variable have. Not if the output you got was accurate (which we have no way of knowing without seeing it and knowing what your original variable is).

The answer to whether you want an intercept depends on if it makes substantive sense to have one and details of which specific type of regression you are running. The convention for OLS regression is to use an intercept. If it makes no substantive sense to have a value of the DV when all your independent variables are zero you may want to modify or eliminate it. But if you do so you will have no comparison group to compare your dummies to. Then you will have to, as dason noted, run a variable for each level of your categorical variable.

3. ## Re: Dummy Variables

The scenario was made up, but your last two sentences better got at what I was asking.

4. ## Re: Dummy Variables

Although I have not seen it used outside hiearchial linear models, one thing you can consider if you keep the intercept is centering your independent variables on the group or grand mean. This makes the meaning of the intercept different and may make a lot more sense. But again this is rarely done in practice in my observation. In most cases you simply use an intercept, but ignore it. It is the norm in most modeling. Note that many discussions of the intercept for dummy variables tends to confuse. They say that you compare the level of a dummy variable to the intercept to see the difference between a given dummy and the reference level.

But that only works simply if you have one set of dummy variables (that is tied to a single categorical variable). If instead you have a set of dummy variables for say gender and another for education level, than the intercept will reflect the value of a specific gender and education not one or the other.