I'm running a regression analysis to understand the impact of income (predictor variable) on scores on an instrument (dependent variable). The only problem is that I don't know each individual's income; I've estimated it using their ZIP code. I've experimented with a few different methods to account for the fact that 2 or more people might share a ZIP code, and therefore income, but are not necessarily the same. Here's my latest approach:
Mixed model with ZIP code as the random effect. However, there is an average of only 2 observations per group. So, I estimated income by county (instead of by ZIP) and now there is a mean of 5 observations per group. Reduced my p-value a little, but it's still significant, and I think might be safer to defend statistically. I'd appreciate any thoughts on this approach, or if you know of another way to do it.
Thank you!
Mixed model with ZIP code as the random effect. However, there is an average of only 2 observations per group. So, I estimated income by county (instead of by ZIP) and now there is a mean of 5 observations per group. Reduced my p-value a little, but it's still significant, and I think might be safer to defend statistically. I'd appreciate any thoughts on this approach, or if you know of another way to do it.
Thank you!