Categorical variable as dependent variable

I have a categorical variable which is like:

"How much do you like the sun?"
1 not at all
2 not a great deal
3 neither/nor
4 quite a lot
5 a great deal

How should one deal with it if this is the dependent variable? I thought about two ways:

(a) Make a dummy variable (like/dislike) by dividing the categorical variable into half. My issue is those people who are indifferent (neither/nor). I can put them into one of the two sides ("like" side and "dislike" side) which will produce large bias OR exclude those observations from the sample OR cut this group of people in half and put one half of the observations in one side (like-side) and the rest in the other (dislike-side) which sounds more fair to me.

(b) I could treat the categorical variable "numericaly" as it already is, using its current scale (1 to 5) or any other scale that I find more intuitive.

Thanks a lot!


TS Contributor
Option A will lose a lot of information in the data. You did not state what type of independent variables you have. Can you use Ordinal Logistic Regression?
Thanks for the reply,

My independent variables are categorical (ordinal, dummy, nominal) and discrete (e.g. age). Will check the ordinal logistic regression.


Fortran must die
I am not sure you really do lose a lot of useful information from converting a categorical variable into a binary one. It depends on what you are interesting in finding out and whether the categories actually add much to that effort.

If you decide you don't want to lose this information you can do ordered logistic regression (if you feel the variable is ordinal which it seems to me). Or you can do multinomial logistic regression if you don't think the categories are ordered.
It seems like ordinal logistic regression is the safest option. However, I'm still curious about finding a good way to make the categorical a binary one, hence continuing with logit/probit.


Fortran must die
I am not sure what you mean by logit. Binary logistic regression?

I think you would be better off to exclude the middle category entirely if you use binary logistic regression, because including any of them in either of the like or dislike is essentially speculation on your part and could be entirely wrong. Conceptually they have not indicated they like or dislike the sun so you should not I would think artificially put them into one of these categories including splitting them up between the two evenly.


TS Contributor
If you really want binary categories, a common practice (I'm not saying that it is a good practice) is to evaluate the top box/bottom box scores for 5-point scales and top 2 box/bottom 2 box scores for 7 or higher scales. In essence, this discards the data where the respondent is neutral or weakly pulled to either side.