Hi Everyone,
I have a question about the best model to use for this analysis.
The question I am trying to analyze is - What is the association between Indicator 1 and age, sex, and group?
However, as it stands, when the data is processed all values for A and B are >0. If it's missing in A but not in B, or vice versa, it's excluded since that means Indicator 1 isn't estimable. It's messy data, for sure.
So, my initial thought was to model this as a truncated negative binomial. In terms of R code:
f1 <- glmmTMB(IndA ~ agecat*sex*group + offset(log(IndB)) + (1 | site), zi = ~ 0, disp= ~agecat + sex + group, family=truncated_nbinom2, data=dta)
But, then I wondered if it would be reasonable to assume all the missing are 0, impute them as 0, and run the same model as zero-inflated? I don't love this because it requires too many naïve assumptions. But it was a thought.
And then I began wondering if I should be modeling Indicator 1 (the ratio of A and B), rather than as A with B as the offset. A and B are extremely right skewed with a lot of low counts. The ratio is a funky but more centralized shape:
Both A and B look like this:
A/B looks like this:
So, I've got myself turned inside out. Really I think the decision is between leaving the data as truncated and either modeling A with B as offset as NB or A/B as truncated linear regression.
I thought I'd ask people's opinions. I can provide more info, if necessary.
I appreciate any responses I get!
I have a question about the best model to use for this analysis.
The question I am trying to analyze is - What is the association between Indicator 1 and age, sex, and group?
- Data is measured on a facility/site. This is a random effect. The data are aggregated within the site by group, age, and sex.
- Indicator 1 is defined as Indicator A at a certain time point / Indicator B at a time point 6 months prior. Both Indicators A and B are counts. Therefore Indicator 1 is a proportion.
However, as it stands, when the data is processed all values for A and B are >0. If it's missing in A but not in B, or vice versa, it's excluded since that means Indicator 1 isn't estimable. It's messy data, for sure.
So, my initial thought was to model this as a truncated negative binomial. In terms of R code:
f1 <- glmmTMB(IndA ~ agecat*sex*group + offset(log(IndB)) + (1 | site), zi = ~ 0, disp= ~agecat + sex + group, family=truncated_nbinom2, data=dta)
But, then I wondered if it would be reasonable to assume all the missing are 0, impute them as 0, and run the same model as zero-inflated? I don't love this because it requires too many naïve assumptions. But it was a thought.
And then I began wondering if I should be modeling Indicator 1 (the ratio of A and B), rather than as A with B as the offset. A and B are extremely right skewed with a lot of low counts. The ratio is a funky but more centralized shape:
Both A and B look like this:

A/B looks like this:

So, I've got myself turned inside out. Really I think the decision is between leaving the data as truncated and either modeling A with B as offset as NB or A/B as truncated linear regression.
I thought I'd ask people's opinions. I can provide more info, if necessary.
I appreciate any responses I get!