Accounting for demographics when outcome is rates


I've been reading some public health papers and they account for demographic factors such as age, gender, and race in regression models (poisson/negative binomial) where the outcome is daily/monthly rates. Considering that the demographic factors are at the individual level and rates at city/state level, how is this done? If the demographic factors are considered at the city/state level, would they be percentages or number of people within those categories?

I'm sure they did not standardize/adjust for these characteristics prior to running the model because they clearly state that these factors were put into models as covariates and they modeled crude rates.

Thank you!
Can you link one of the papers?

Initial thoughts are that the relationship even in the regression with demographic factors was linear such that the rate remains the same. It's a naive model but I've seen that done many times in follow up studies.

Although other times I see Time used as a random variable to do a random intercept followed up with a random regression. I usually see this between exposure categories and their outcomes. The other method is Cox regression but you first need to determine the relationship of outcome over time (linear. nonparametric, etc). You'll get proportional ratios but they would be reported as such.

OR I missed your point entirely
Not sure about how to do it in SAS but random intercept isn't assigned. Here's the regression equation:

yij= B0+B1(x)ij+ μi+ ε ij


yij= B0i+B1(x)ij+ ε ij

What this equation is doing is calculating Y (whatever your response is) per subject so each subject will have their OWN regression lines (because B0 is per the i'th subject) and the distance along the Y-axis is now relative to the variance of the i'th subject adjusted by the shared variance among all subjects.

In layman's terms what you're doing with the mixed model approach is adjusting for the random variance within each individual (some individuals have hereditary traits which could influence their weight gain for instance so if they are in a clinical trial for weight loss their genetics will confound your treatment. This modelling approach may at least be able to identify and adjust for it).

Full blown random regression models are similar to random intercept models BUT you are also treating the responses as random effects; that is the regression model will identify the subject-specific variance for each covariate using Restricted Maximum Likelihood (REML and has standard Poisson assumptions). You can get this using the proc MIXED procedure in SAS.

I'll read the paper but at this point if you're unfamiliar with this procedure then I doubt it's what the authors are using. My bet is Cox regression.

EDIT: they just use Poisson with a negative binomial distribution. Think of it this way. They have count data, the discharge report for cases >35 years of age with ICD codes for Acute Myocardial Infarction. They also have the population number(s) for the area they're studying so they can create a time series for incident cases. All the poisson regression negbin does is calculates the mean difference for each time series. Fit that series and you'll get your trend analysis.

proc GENMOD data="";
<insert class statement(s)>;
model y = stuff;

Y will essentially be your rate per time series and usually the model will account for its own variance within itself to be fixed against the variance of each time series but tbh I don't use SAS so I don't know how to script that. Since the smoking study isn't a repeated series you cannot really do that but the negbin distribution should account for overdispersion.

Also I reread your OP; they state specifically that they modeled the crude rates that DID age-adjust for each time series (remember they found crude rate by taking census data to age adjust their population at risk.). With regards to demographics: age, gender, smoking status are all "effects" and when you model them you're implicitly assume their effects will be the same across all observations. That's what modelling does; it finds the common variance among all observations due to the effect(s). Individual or city level is a semantic issue (actually it's not but heuristically you can model it and still identify heterogenous effects with a more robust model). Proc GLIMMIX is just a heuristic to fix generalized linear mixed models so yes it will in a sense identify a random intercept but not between repeated measures and instead identify based on the variance within each time series that resolves some variance for a better rate estimate (and thus a better trend analysis)
Last edited:
Thanks for the helpful explanation! So is time series data that involves counts typically modeled with poisson regression that doesn't involve mixed models?