# Analysis of case-control

#### helpwithstats

##### Member
I am working on a project that involves case-control design. Participants with and without a particular intervention (receiving treatment X or control)) are matched on sex, diagnosis and age (+/- 7 years) and self-reported questionnaires are used to collect data related to various health outcomes.

The outcome variables are the count data (ranging from 1 -7) with most having 0, 1 or 2 events (Poisson distribution). Because the data are matched I need to account for the correlated data and I am wondering what an appropriate analysis would be if I am interested in examining a model for both conditional and marginal distributions. I have looked at General Estimation Equations or Generalized Linear Mixed Model but need some guidance on which approach may be most appropriate.

Thank you

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I historically get confused by terms "conditional / marginal". I think it is because they are never defined in complex settings. I was under the idea, conditional would come from the model controlling for covariates and marginal would come from an empty model like yours. What are you looking for in particular?

Also, given your possible outcome distribution -- you may need to examine dispersion and see if Poisson is the best fit over say negative binomial regression or zero inflated. Also, I am imagining whichever count approach that is used, there is a "conditional" version. Meaning conditional on matching. I am not sure Mixed Modeling, which I prefer the term "Multilevel Modeling" due to confusion, is necessary beyond controlling for strata (matching) in a more basic conditional approach.

P.S., I have not done a C-C with a count outcome measure, but I believe you may need to adjust the intercept to account for the artificial balance between outcome groups, which is not likely the case in the super-population.

#### jamesmartinn

##### Member
What are your data sources? Are you analyzing population level data? If so, the GEE approach (i.e. marginal) is probably the best choice.

The interpretation of coefficients would be 'population-averaged' - averaged over the distribution of random effects you get from a conditional model, where interpretation depends / conditioned on a level of the random effect.