When are non-parametric tests more desireable than a scale transformation when working on non-normal data?

unbiased95

New Member
When, during an exploratory analysis, should one go for a robust test instead of a logaritmic/Box-Cox transformation of a variable if the Shapiro-Wilk test to verify the normality of the distribution of said variable is significant? Also, if I transform a variable with a logaritmic or a box-cox transformation and then, after repeating Shapiro's, the normality hypothesis is accepted, should I go for non-parametric tests either way?

Karabiner

TS Contributor
Why would you care whether some variable is distributed normal (in the population from which your sample was drawn)?
There is hardly any procedure requiring this.

Please describe your research questions, your sample size, and the variables and their measurements.

With kind regards

Karabiner

unbiased95

New Member
Hi Karabiner, thanks for the answer. This is actually a general procedure taught us from our teacher since we're learning how to perform exploratory analysis for a university course of statistics, so isn't related to any particular dataset. Basically,she told us that if a variable isn't normally distributed (evidence comes from QQplot + Shapiro test) we can't perform parametric tests and should go for robust methods based on ranks. That goes for when the sample is too small or there are strong outliers as well.
My doubt is, since a scale transformation can normalize the distribution of a non-gaussian distributed variable but can alter the interpretation of the results (since the scale is converted from one to another), if this is better than just going for a robust test to verify some hypotheses.

Last edited:

Karabiner

TS Contributor
There is no need for dependent variables to be normally distributed.
The prediction errors from the respective model should sometimes be normally distributed.
For regression, the residuals should be normally distributed, not the dependent variable.
Same holds for analysis of variance, but one could express this alternatively by saying
that the dependen variable should be normally distributed in each cell in ANIVA. This is
also true for t-tests (distribution in each cell, not the total). And all this can be neglected
if sample size is large enough (n(total)> 30 or so). But a combination of small sample size
and outliers could make it useful to consider rank-based procedures. Keep in mind that they
do not compare means. But if you transform data, you analyse something different
from your original variable.

With kind regards

Karabiner

fed2

Active Member
so basically just run a wilcoxon or you are doomed to endless round the mulberry bush discussions about normality of whatever needs to be normal or not.