I am analyzing a dataset on offspring production in fruit flies that have mated with individuals from their own population or another population. So for each experiment I have data on offspring produced for four crosses: F1xM1 F2xM2 F1xM2 F2XM1 (N approximately 30 for each cross). I would ideally like to use a factorial anova where I am interested in the two main effects (source population of female and source population of male) and the interaction between them (which may indicate incompatibilities between flies from different populations). I am unable to normalize the residuals through transformation mainly because of zeros in the dataset as typically 5-20% of females do not produce any offspring (those that do produce typically have 25-40 offspring on average). One solution I have tried is to break the analysis into two parts: (1) are there differences among the four crosses in the likelihood of failing to produce offspring (using Fisher's exact test), (2) for those that did produce offspring I use a factorial ANOVA as a square root transformation does normalize the residuals. However, I have just come across the possibility of using a negative binomial regression approach to analyze all the data including zeros, but I'm not sure if this is a better solution. I am unclear on how I determine that a negative binomial model fits my data well (I am using SPSS), or more specifically is a better solution than the alternative of breaking the analysis into two parts. I understand that I could use things like BIC or AIC to compare different models (e.g. Poisson vs NEB), but that doesn't seem applicable here. The deviance/df estimate is about 2, and my understanding is that it ideally should be 1. I am not sure whether 2 would be considered too large indicating a bad model fit? Or, are there other ways I should be checking to verify that the model fits well? Any thoughts or advice would be much appreciated, thanks!