Poisson and ANOVA?

#1
I have three groups of data, comprised of rates during three time frames. The rates are not integers, but decimals. There are over 1000 observations for each time frame.

I would like to compare the three groups, normally what an ANOVA would do. However, the data are distributed much more like Poisson than Normal (i.e. heavily weighted on slow rates but with a fair number of large rates - occasionally things go quite quickly). Clearly, these are not count data and it seems my only option is to use a non-parametric KW test.

As much as possible, I would prefer to use some kind of regression analysis and account for the lack of normality by matching the proper error distribution, which will provide more information (p-values for each group). Is there any way that I can use a regression to compare the groups but incorporate a Poisson distribution, for example?

Thanks a lot for any comments on this.
 

JohnM

TS Contributor
#2
I would still use a regular ANOVA - remember, as n approaches infinity, the distribution of the means will approach normality, and with sample sizes of 1000, they are certainly large enough for the CLT to be in effect.
 
#3
Hmm.. could do, but I really believe that the assumptions for ANOVA (despite its robustness) are violated. The qqplots, Shapiro-Wilks test, and visual descriptions of the data (boxplots etc.) are all very skewed and far from what I would consider "good" for an ANOVA. That said, I will go with KW, I suppose.

Now, trying to find good post-hoc estimation for KW... I have read a little on this forum, but haven't had good luck with non-integer data (in R, using npmc package) or in Stata... trying with Mann-W-U comparisons and Bonferroni correction, but I'm not convinced that's ideal.

Thanks for your comments!!
 

JohnM

TS Contributor
#4
Hmm.. could do, but I really believe that the assumptions for ANOVA (despite its robustness) are violated. The qqplots, Shapiro-Wilks test, and visual descriptions of the data (boxplots etc.) are all very skewed and far from what I would consider "good" for an ANOVA. That said, I will go with KW, I suppose.

Yes, but with a large sample size, this is entirely possible, but may be meaningless - it is very easy to get a "significant" result with a large sample size that is not important from a practical perspective.

ANOVA is extremely robust to violations of the normality assumption, as long as the other assumptions hold (homogeneity of variance, etc).

Now, trying to find good post-hoc estimation for KW... I have read a little on this forum, but haven't had good luck with non-integer data (in R, using npmc package) or in Stata... trying with Mann-W-U comparisons and Bonferroni correction, but I'm not convinced that's ideal.

Your problems may be due to the fact that you need to transform your decimal data into ranks in order to perform KW and/or the post-hocs.

Just my opinion, but I think you're making this a lot more complicated than it needs to be....
 
#5
Yeah, I likely am making this more complicated than necessary, but is ANOVA really robust to data that are this non-normal (see qq)?

I have no problem with the main statistical tests, just the post-hocs are a bit troublesome. I've decided to use the MWU and Bonferroni; seems Nemenyi or Steel-Dwass would be better, but I'm not sure these data are fit for such analyses (yet).

Thanks for your suggestions.
 

JohnM

TS Contributor
#6
Yes, it's robust to data like your picture. Remember, ANOVA analyzes means, not individual data points....

I distinctly remember my SAS class in grad school where the prof had us analyze a data set containing patients' PSA levels (a prostate cancer screening test). The data was extremely skewed to the right, but he insisted that the CLT applied, and not to worry about it.
 
#7
OK - thanks! That sounds good!

Further reading also confirms that lack of normality is really only a problem for ANOVA models when it results in variance heterogeneity (unequal variances, particularly with unequal sample sizes)! And since plots of residuals versus the means are the most informative diagnostic for ANOVA, the graph below confirms everything you've said.

Thanks for your help!

(Although I am still curious about fitting a GLM that would allow for a proper error distribution instead of trying to fit everything into this ANOVA)