First, thanks in advance for any help. I'm trying to add some control variables into an OLS model, but because of the type of data I'm using, don't know how to do so in the right way.
I have data from 100 cities describing the proportion of housing that belongs to various categories (all from the American Community Survey by the Census bureau). I'm looking at how the proportions of the various types of housing impact energy use.
I want to control for various factors beyond that -- for example, people in apartment buildings use less energy than people in free-standing houses, but part of that effect is due to income differences (they make less on average) and part is due to the energy-reducing characteristics of their housing (shared walls, smaller space, etc.). I want to know the differences in energy use for equivalent households in different types of housing.
I don't have data at the home level. That would be too easy :-) I only have averages in each city for each category, and an overall average per-capita income.
How can I construct the model to adjust the results I get (expressing difference in energy from housing types) to recognize the differences in the averages of the factors I want to control for? I can put the other factors in the model as overall city averages. I know it doesn't work
My worry is that my coefficients that express the relationship of housing type to energy are also expressing the effects of different incomes, different occupancy rates, etc., and are distorted by them.
Any ideas how to design the regression model to accurately express the impact of these average differences?
Last edited by stw_25; 03-20-2009 at 09:55 PM. Reason: clarified my explanation
Advertise on Talk Stats