Omitted group in regression

Hello all,

I understand that if you have individual level data and you are trying to calculate something like income you might run a regression like

probability of incarceration = intercept + education_level +other_controls . . . + error


education_level is a categorical variable with the groups less than HS, HS, beyond HS and we have an omitted group

My question is, what happens when you aggregate your data up to say a state level (say you want to predict states' incarceration rates) and you have a separate variable for each education_level category that represents the proportion of individuals in that state who have a certain education level. Do we still have an omitted group?

incarceration rate = intercept + percent_less_than_HS +percent_HS + percent+beyond_HS +other_controls . . . + error

I'm having trouble trying to reason through this.


TS Contributor
So you have, say, values like "23% on level A, 50% on level B, 27% on level C"?
Since for each case values would add up to 100 there would be redundancy:
you could determine the value on each level by using the other two. Therefore
you have to leave out one level.

