# Thread: Should you transform skewed data when the distribution is expected?

1. ## Should you transform skewed data when the distribution is expected?

Hi all,

I have been discussing this topic all week with people in my department and I can't seem to get a straight answer. I was taught long ago that you should not necessarily transform a variable if the population distribution is expected to be skewed (for use in regression analysis). For example, symptom measures (e.g., posttraumatic stress symptoms, depression symptoms) expected to have a positively skewed distribution in the general population and this is typically what we see in sample data. So, if the population distribution is suppose to be positively skewed and your sample data has the expected positive skew, should you transform that variable? Any clarification on this issue would be greatly appreciated!

2. ## Re: Should you transform skewed data when the distribution is expected?

It's true that you don't always need to transform your variables just because they have funny distributions, but this has nothing whatsoever to do with whether or not you expected the data to have a funny distribution beforehand. Why would it???

3. ## Re: Should you transform skewed data when the distribution is expected?

I think the response to the question would be another question. Why would you alter the distribution of scores to represent a distribution that they are not representative of in the general population? Isn't that part of the idea of the normal distribution? That in the general population, scores for most things will follow a normal distribution. Well, if you have a variable or measurement that does not follow the normal distribution in the general population, why would you alter it in a smaller sample when it is actually correctly representing the distribution in the general population?

4. ## Re: Should you transform skewed data when the distribution is expected?

Because a lot of our most common statistical techniques depend on assumptions about the error term being normally distributed, and violation of this assumption can result in biased confidence intervals. Transforming the raw variables to be more normal can help this because it may result in the errors also being more normal.
http://en.wikipedia.org/wiki/Ordinar...ming_normality

5. ## Re: Should you transform skewed data when the distribution is expected?

My answer would be that methods like regression assume non-skewed data and that the results will not be accurate if your data is skewed. One example of how this works is that outlier test (such as Tukey's boxplot) that assumes normal data will generate incorrect number of outliers if the data is highly skewed.

As far as I know the fact that the data really reflects a skewed population (as data often does) has absolutely nothing to do with this. It has to do with the assumptions the method is making in doing its analysis.

6. ## Re: Should you transform skewed data when the distribution is expected?

I received the following response to my original question from an old stats professor of mine"

"Regarding transformations, it depends upon what analyses you plan to carry out on the variables. For example, when doing ANOVA with large (greater than 40 per group) sample sizes, there's no need to transform a skewed DV, as the Central Limit Theorem ensures normality of the mean values. Similarly, for regression, the assumption is that the *residuals* are normally distributed, not necessarily the variables. This may result in an analysis where a number of variables are skewed, but there's no need for transformation because the residuals have a normal distribution. Really, transformation is so dependent upon the particulars of your situation, that it's difficult to formulate generalizations."

While I appreciate his response, it doesn't necessarily get at my original question.

7. ## Re: Should you transform skewed data when the distribution is expected?

It is also a response other statisticians disagree with. One Stanford prof wrote a book suggesting outliers could totally invalidate ANOVA despite the CLM even with very high sample sizes (although this is somewhat different than skewness per se). And testing for skewness in the regression data, not in the residuals, is commonly recommended in works on statistics and statistics classes. I posted a while back to well known statistical experts who argued that normality was critical to regression, not simply normality in the residuals.

8. ## Re: Should you transform skewed data when the distribution is expected?

Sorry to open up this thread again; I have this exact problem. Is non-parametric the answer?