What are they counts of?
Could you use a negative binomial ditribution?
I'm using a Poisson generalized linear model (log link function) to model count data with one continuous and one categorical predictor. For one of the values of the categorical predictor, all the responses are zero, so the sample mean and sample standard deviation are zero, and the log of the response is undefined. This messes up the model parameters badly, making the parameter estimates related to that predictor level extremely large, and giving p-values (from a Wald test) near 1 when they should clearly be near 0.
If I add a tiny offset to ALL the data points (like 0.01, for count data in the 0 through 3 range) then the parameters and p-values (from a Wald test) generally behave and reflect what's expected from examining the graph. I know this isn't the right way to go; I just wanted to verify that my problems probably have to do with log(0) not being defined.
I see no reason to think a zero-inflated Poisson model is appropriate for these data, because nothing about the process should be forcing the response to zero before the natural "count" mechanism takes over. I think the rate of incidence of the response is really low for this one treatment, so it wasn't observed at all.
Is there anything I can do to my model to gracefully deal with this situation? Maybe some good way to use an offset in the link function?
What are they counts of?
Could you use a negative binomial ditribution?
The earth is round: P<0.05
They are counts of the number of morphological aberrations visible in certain cells on various days of an experiment (3, 5, 7, 9, 11) in three different strains of animal. It doesn't seem the negative binomial distribution is appropriate for this process, and that link function has the same problem with zeros as the Poisson.
I didn't really read too carefully but a few things caught my eye.
1) The link function is applied to the expected value of the response - not the response itself. So having 0s doesn't mean that you can't use a log link.
2) How exactly are you adding small amounts of noise to the response when you're fitting a model that explicitly states that the response will be an integer?
I don't have emotions and sometimes that makes me very sad.
I think I'm having problems with the expected value of the response is zero for one of the treatments, because all the data were zero.
I'm adding small amounts of noise directly to the data (a "count" of 2.001 instead of 2, etc). I know it's not sensical appropriate for the model, which is why I'm asking for the alternatives. I just did it to verify that the behavior of log() around zero is causing the difficulty.
Note that what I'm saying though is that when fitting this kind of model you never actually take the log of the data.
Having a group that is all 0s will cause issues but without seeing your data its hard for me to comment more.
I don't have emotions and sometimes that makes me very sad.
Here are histograms for all the data I'm trying to model:
Each color represents cells from a different strain of animal. The x-axis for all the histograms is the count of a particular type of cellular aberration per cell. Day is being treated as a continuous variable, although it's measured at discrete intervals.
I'm actually modeling several different types of aberration, and the model I'm using (Poisson generalized linear model with day as a continuous predictor and strain as a categorical one) works very well for most of them, but I have problems with this set where all the responses are zero for one of the strains, and with a couple other sets with just one positive response per strain.
What's the appropriate way to deal with this?
Now that I've shown the data and explained the process that generates it, does anyone have any suggestions on how to model it more effectively?
Tweet |