Unbalanced, non-normal, heteroscedastic ANOVA alternatives.

I wanted to compare the means of 4 groups, all of which with a different number of observations. None of the group residuals are normally distributed (nor their log-transforms) and they are also heteroscedastic: at worse the SD of one group being about twice that of another.

ANOVA, including Welch's is out of the question. Is Kruskal-Wallis the only option? Any other tests/options i haven't considered?



Super Moderator
Can you describe the measurement process? Often if we know something about how the data came to have this distribution we might be able to think of a better response distribution to specify.
Thanks for the interest.

The data is from a Raman spectroscopy dataset. Basically we get a spectral graph like that in attachment 'RawSpectra_Baseline'. Most of this spectra is due to fluorescence instead of Raman scattering and the red curve is an estimation of this fluorescence.

There are nearly two thousand such spectra, obtained from biological samples with 4 different types of fixation method. We are investigating whether the different fixation methods have a difference on estimated fluorescence, which we are measuring as the area under the estimated curve.

It is these areas which are unbalanced and heteroscedastic. The histograms are in the attachment 'Fixation_Hists'.

I'm doing this in Matlab and can give that data if people are interested. Cheers.
An area is often a square of something, so that would indicate that the square root transformation could be good (a lambda of 0.5 in the Box-Cox transformation). Although you have said that the logarithm (a "lambda of 0") does not make it normal.

But there seems to be a lot of data so that a t-test (on transformed scale) could work well.

Om the histogram it looks as if there is an extra "hump" (a bimodal density) for larger values in the first two diagram. Maybe the curve-making and area computation just did not work on these data points - so that they are "wrong" and needs to be redone.

Why not attach the data in a text-file or something?
Didn't think of a square root transformation - will try it. Also didn't know about Box-Cox family of transforms, will play with them a little.

Attached the data. There are two columns, the first is the area under the estimated curves and the second is the group to which the row belongs (1-4), no headers.