I have a question about the best model to use for this analysis.

The question I am trying to analyze is - What is the association between Indicator 1 and age, sex, and group?

- Data is measured on a facility/site. This is a random effect. The data are aggregated within the site by group, age, and sex.
- Indicator 1 is defined as Indicator A at a certain time point / Indicator B at a time point 6 months prior. Both Indicators A and B are counts. Therefore Indicator 1 is a proportion.

However, as it stands, when the data is processed all values for A and B are >0. If it's missing in A but not in B, or vice versa, it's excluded since that means Indicator 1 isn't estimable. It's messy data, for sure.

So, my initial thought was to model this as a truncated negative binomial. In terms of R code:

f1 <- glmmTMB(IndA ~ agecat*sex*group + offset(log(IndB)) + (1 | site), zi = ~ 0, disp= ~agecat + sex + group, family=truncated_nbinom2, data=dta)

But, then I wondered if it would be reasonable to assume all the missing are 0, impute them as 0, and run the same model as zero-inflated? I don't love this because it requires too many naïve assumptions. But it was a thought.

And then I began wondering if I should be modeling Indicator 1 (the ratio of A and B), rather than as A with B as the offset. A and B are extremely right skewed with a lot of low counts. The ratio is a funky but more centralized shape:

Both A and B look like this:

A/B looks like this:

So, I've got myself turned inside out. Really I think the decision is between leaving the data as truncated and either modeling A with B as offset as NB or A/B as truncated linear regression.

I thought I'd ask people's opinions. I can provide more info, if necessary.

I appreciate any responses I get!