The normalization assumption is for the original data and residuals, or only residuals?

for example, here is some discussion.

I remember I was taught that when the data is not normaly distributed, I better use wilcoxon rank test, rather than T-test; and for regression and Anova, I better transform the data first (log transformation, sqrt, square, etc) to get closer to normal distribution.

But now I read what is more important is the residuals have normal distribution.

My data is heavily positively skewed, and after log transformation, it is much better. I guess I will use the transformed data for regression and anova. But right now I am really confused by this question