Hello everyone,

I'd like to examine the robustness of the Anova, given that the dependent variable is not distributed normally, but negative binomial. Therefore, I use the following code:

Thus, i draw 100,000 times samples out of three different negative binomial distributions and conduct an Anova between them, let R give me out the p-value and then I "count", how many of the 100,000 p-values are <=5%. If I had three normal distributions instead of three negative binomial ones, I would expect 5,000 p-values to be lower than 5% (all means are equal). However, contrary to what I expected, my code for the negbinom gives me as well round about 5,000 p-values lower than 5%, which means, in my opinion, that the anova is robust against a violation of the normal-distribution-requirement. At first I thought the result to be due to the large sample size, but I get the same result, when I lower the sample size to N=10 per group.

However, browsing through the Forum, I understood that, contrary to what I had been taught in my statistic lectures, the anova requires only normal distribution of residuals, and not of the dependent variable itself. Therefore, I have three questions and would really appreciate if one of you could give me a hint on one of them!

1. Is an error in the code, which leads to the -in my opinion- unexpected result?

2. Is it an misapprehension for my part, that anova with three negbinom distributions shoud lead to another amount of p-values <=5% than with three normal distributions?

3. Is an Anova with three negbinom distributed dependent variables feasible anyway, as long as the residuals are distributed normally?

I would be glad if one of you could give me a hint!

I'd like to examine the robustness of the Anova, given that the dependent variable is not distributed normally, but negative binomial. Therefore, I use the following code:

Code:

```
pvalue = rep(NA, 100000)
n = c(352, 198, 170)
group = rep(1:3, n)
for (i in 1:100000){x=c(rnbinom(352, size=0.83, mu=1.85), rnbinom(198, size=0.78, mu=1.85), rnbinom(170, size=0.95, mu=1.85))
data = data.frame(x = x, group = factor(group))
fit = lm(x ~ group, data)
anova(fit)
tmp = anova(fit)
pvalue[i]= tmp[1, 5]}
length(pvalue[pvalue<=0.05])
```

However, browsing through the Forum, I understood that, contrary to what I had been taught in my statistic lectures, the anova requires only normal distribution of residuals, and not of the dependent variable itself. Therefore, I have three questions and would really appreciate if one of you could give me a hint on one of them!

1. Is an error in the code, which leads to the -in my opinion- unexpected result?

2. Is it an misapprehension for my part, that anova with three negbinom distributions shoud lead to another amount of p-values <=5% than with three normal distributions?

3. Is an Anova with three negbinom distributed dependent variables feasible anyway, as long as the residuals are distributed normally?

I would be glad if one of you could give me a hint!

Last edited: