Poisson Regression

#1
Hello!

I have a year's worth of count data that looks like this. It shows the count of events per day for two different user experiences and the total count of users per day.

dat1.PNG

I want to analyze the daily counts of groups A and B with the total user count as an additional covariate in the model. I reorganized the data like this:

1630781064168.png
I don't expect seasonal effects, so I would like to analyze this as a cross-sectional Poisson model. i.e.

count = grp + user_count

Am I going crazy or is the fact that count of users is repeated for each day introducing "non-independence" between observations? I almost want to drop date from the data because it's confusing me. Thank you.
 
Last edited:

fed2

Active Member
#2
i think that you have time here suggests that the baseline assumption should be "non-independence". ie you are not crazy (wrt to this). GEE model?
 
#3
i think that you have time here suggests that the baseline assumption should be "non-independence". ie you are not crazy (wrt to this). GEE model?
Thanks for the insight. I'll take a look at this approach. Edit: Is this similar to a GLMM? To be clear, the counts in group A and group B are mutually exclusive. The same user cannot be in both A and B simultaneously. The event of interest is not likely to happen for the same user in a short period of time (it's filing an insurance claim). The days are repeated simply because I am comparing the two groups (changed the data from wide to long format). Count of users is repeated because it's a number aggregated for that particular day. Would it make sense to include a random effect for the group? How is a GEE different from a GLMM?
 
Last edited:

fed2

Active Member
#4
yes GEE models are essentially GLMM, but the modeling is much simpler, which is the main attraction. Rather than specifying actual distributions for the random effects, you just specify the 'working correlation structure'. This greatly reduces the statistical issues with model fitting. in GLMM most software has in particualr a numerical integration in each iteration of the liklihood, which is complicated, and frequently sucks, in my experience.

To be clear, the counts in group A and group B are mutually exclusive.
hmmmmmmmmmmm. If you do not have 'count of users' as a fixed effect, i reckon A and B will be correlated, just because they will seem to scale together. If you have 'count of users' in the model, the correlation may disappear or even reverse. You can probably tell by just scatter plotting the variables. and look at the partial correlation between a and b controlling for total users.

I guess the easy way would be to just set the 'working correlation' to be unstructured, and let the computer decide if there is a correlation.
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
count = grp + user_count

What is user_count? number of claims file within a person? So you have two groups and a person can only be in one group, but they can be in that group more than once? It may help to just describe the actually setting, so we will understand any dependencies.