robustness-analysis of one-factor anova in R with negative-binomial distributed data

Renoir Pulitz

New Member
Hello everyone,
I'd like to examine the robustness of the Anova, given that the dependent variable is not distributed normally, but negative binomial. Therefore, I use the following code:
Code:
pvalue = rep(NA, 100000)
n = c(352, 198, 170)
group = rep(1:3, n)
for (i in 1:100000){x=c(rnbinom(352, size=0.83, mu=1.85), rnbinom(198, size=0.78, mu=1.85), rnbinom(170, size=0.95, mu=1.85))
data = data.frame(x = x, group = factor(group))
fit = lm(x ~ group, data)
anova(fit)
tmp = anova(fit)
pvalue[i]= tmp[1, 5]}
length(pvalue[pvalue<=0.05])
Thus, i draw 100,000 times samples out of three different negative binomial distributions and conduct an Anova between them, let R give me out the p-value and then I "count", how many of the 100,000 p-values are <=5%. If I had three normal distributions instead of three negative binomial ones, I would expect 5,000 p-values to be lower than 5% (all means are equal). However, contrary to what I expected, my code for the negbinom gives me as well round about 5,000 p-values lower than 5%, which means, in my opinion, that the anova is robust against a violation of the normal-distribution-requirement. At first I thought the result to be due to the large sample size, but I get the same result, when I lower the sample size to N=10 per group.
However, browsing through the Forum, I understood that, contrary to what I had been taught in my statistic lectures, the anova requires only normal distribution of residuals, and not of the dependent variable itself. Therefore, I have three questions and would really appreciate if one of you could give me a hint on one of them!
1. Is an error in the code, which leads to the -in my opinion- unexpected result?
2. Is it an misapprehension for my part, that anova with three negbinom distributions shoud lead to another amount of p-values <=5% than with three normal distributions?
3. Is an Anova with three negbinom distributed dependent variables feasible anyway, as long as the residuals are distributed normally?

I would be glad if one of you could give me a hint!

Last edited: