Must categorical variables always be dummy coded in linear regression?

My question is must we always dummy code categorical variables in linear regression. What if I wanted to understand the overall effect of a categorical variable couldn't I just include in it my regression without dummy coding it? What if I had ordered categorical data could I include it as one variable without dummy coding and get an overall sense of the estimated relationship?

Alternatively, what if I wanted to control for the variable but not necessarily know the relationship among the various indicators? For instance, let's say I'm running a regression on house sales and I wanted to control for house color. I don't really care which color sells at a higher value but I would like to know overall if house color is significant. Couldn't I just include the categorical variable without dummy coding?


How could you include it without dummy coding it? Have you ever worked through regression by hand? With matrix algebra? If not this is a valuable exercise. You'll see it needs numbers. Dummy coding is merely a way of tricking the regression to accept categorical (well maybe it's more sophisticated than tricking).


TS Contributor
What if I had ordered categorical data
ok, that means something like "very small - small - bigger - very big".
In order to handle this more easily, people might use numbers, e.g.
very small = 1, small = 12, bigger = 1000, very big = 100000000.
Or anything else (e.g. 1 2 3 4), as long as there's a correct ranking.
Of course there might be people who confuse indicators for the
ranking of variables with numbers which can be treated like interval
scales (so, in a regression they would use 1, 12, 1000 and 10000000
as if it were real numbers), but don't do that yourself.

With kind regards