Data will not normalize & need to run ANOVA

#1
Hello, I have a set of data that I need to run a 2-way ANOVA on. My data is very highly skewed to the right, so it does not follow a normal distribution. I have tried just about every transformation I can find, and cannot get my data to pass the K-S normality test. I wish I could just run a non-parametric test for my data, but my advisor wants me to do the ANOVA and isn't being much help.

I normally use SigmaPlot 12, but I also tried Minitab. Minitab actually told me that my data followed a normal distribution (also using the K-S test). I did not change anything, it was indexed the same way in both programs.

Does anyone have any advice? My data has a lot of zeros, and constitutes around 80% of the whole set. The only advice my advisor gave me was to add 0.1 to each number, and then run transformations. However, that is not working either. Thank you for any help!
 

hlsmith

Not a robit
#2
Its not whether your data is skewed or not normal, it is whether or not the residuals from the ANOVA are normal or not.

If you can't rectify, explore generalized linear models (glm), but check the residuals first. You don't have to use ANOVA, if you don't meet the assumptions then the conclusion may be wrong and it is pointless.
 

noetsi

Fortran must die
#4
I believe that non-normality is less of an issue in ANOVA than regression because of the Central Limit Theorem.

Not surprisingly people vary signficantly on this issue.

The following authors says flat out to use non-parametrics if you have abnormal data not ANOVA.

http://www.emis.de/journals/HOA/ADS/Volume7_4/206.pdf

And the following work says....

Like other parametric tests, the analysis of variance assumes that the data fit the normal distribution. If your measurement variable is not normally distributed, you may be increasing your chance of a false positive result if you analyze the data with an anova or other test that assumes normality. Fortunately, an anova is not very sensitive to moderate deviations from normality; simulation studies, using a variety of non-normal distributions, have shown that the false positive rate is not affected very much by this violation of the assumption (Glass et al. 1972, Harwell et al. 1992, Lix et al. 1996). This is because when you take a large number of random samples from a population, the means of those samples are approximately normally distributed even when the population is not normal.
http://udel.edu/~mcdonald/statnormal.html
 
#5
Can you tell us more about your data? Is the response actually a count? Is it always positive?
My data is counts of specimens collected in the field, so it is always positive. There are zeros where no specimens have been collected. The two factors are location and month. I need to run an ANOVA looking for any significant difference in these two factors, as well as any interaction.
 

noetsi

Fortran must die
#6
One thing that often leads to non-normal residuals is when there is a floor - so you can't have negative values as in your case. This often occurs when the data is measured as a percentage (you can't go below 0). That can be dealt with, I think by logging the data but its been a long time since I read that literature.

You might look up non-normal residuals percentages or perhas restriction of range and see if they offer a solution you can use.
 

Dason

Ambassador to the humans
#7
I believe that non-normality is less of an issue in ANOVA than regression because of the Central Limit Theorem.
The CLT would apply to ANOVA as well. Afterall ANOVA is just a special case of regression.

My data is counts of specimens collected in the field, so it is always positive. There are zeros where no specimens have been collected. The two factors are location and month. I need to run an ANOVA looking for any significant difference in these two factors, as well as any interaction.
It sounds like some sort of zero-inflated Poisson (ZIP) model would be fairly appropriate for your data