Repeated-measures ANOVA? And how to achieve normality?


I am analysing a dataset in which there I have the following data of 50 participants:

Intervention: yes/no
T0: bacterial counts at T0 in cfu/ml (baseline value)
T12: bacterial counts at T12 in cfu/ml (after 12 months)

The data for bacterial counts are for Total bacterial counts (TBC), as well as for specific bacteria within the TBC.
I'm trying to find out the effect of the intervention after 12 months (is it effective?).
The TBC ranges from 0 cfu/ml (no bacteria detected) tot a maximum of 400000 cfu/ml.

I am thinking of using a Two-way mixed ANOVA, with the intervention as 'between-groups-variable', and the time at which the sample is taken as 'repeated-measures-variable'. Is this the right test to answer the question of this research?

In order to normalise the raw data (which is heavily skewed), I am transforming my data using Log10. However, the '0 cfu/ml' values are not transformed, and if I add them manually (either 0 or 1 (log(1)=0)) they are either outliers (in variables where I have a few 0 cfu/ml values), or the data again is skewed heavily in variables where there are a lot of 0-values. So how do I deal with these 0-values in achieving normality?

Please inquire after any additional information you might need in order to help me!

Thank you in advance,



TS Contributor
You do not need normality of the "data" (maybe of the residuals, but with n = 50 I suppose that the central
limit theorem makes sure that the statistical test is robust), but maybe taking the logarithm (or log(x+1)
is a good idea anyway , due to the shape of the bacterial growth curve (but admittedly, I don't know much
about bacteria).

If your groups show the same baseline counts of bacteria, you could alternatively calculate the pre-post
differences and compare them (or ln(difference) ) between groups.

With kind regards

So how do I deal with these 0-values in achieving normality?
You don't!

If the normal distribution does not fit to the data, then skip that distribution and use an other instead.

Use the Poisson distribution in a usual generalized linear model with a log link. (You can do it with most software). Often the cfu count is considered to be Poisson distributed.