+ Reply to Thread
Results 1 to 8 of 8

Thread: Should you transform skewed data when the distribution is expected?

  1. #1
    Points: 149, Level: 2
    Level completed: 98%, Points required for next Level: 1

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Should you transform skewed data when the distribution is expected?



    Hi all,

    I have been discussing this topic all week with people in my department and I can't seem to get a straight answer. I was taught long ago that you should not necessarily transform a variable if the population distribution is expected to be skewed (for use in regression analysis). For example, symptom measures (e.g., posttraumatic stress symptoms, depression symptoms) expected to have a positively skewed distribution in the general population and this is typically what we see in sample data. So, if the population distribution is suppose to be positively skewed and your sample data has the expected positive skew, should you transform that variable? Any clarification on this issue would be greatly appreciated!

  2. #2
    Cookie Scientist
    Points: 5,936, Level: 49
    Level completed: 93%, Points required for next Level: 14
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    796
    Thanks
    17
    Thanked 315 Times in 241 Posts

    Re: Should you transform skewed data when the distribution is expected?

    It's true that you don't always need to transform your variables just because they have funny distributions, but this has nothing whatsoever to do with whether or not you expected the data to have a funny distribution beforehand. Why would it???
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  3. #3
    Points: 149, Level: 2
    Level completed: 98%, Points required for next Level: 1

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Should you transform skewed data when the distribution is expected?

    I think the response to the question would be another question. Why would you alter the distribution of scores to represent a distribution that they are not representative of in the general population? Isn't that part of the idea of the normal distribution? That in the general population, scores for most things will follow a normal distribution. Well, if you have a variable or measurement that does not follow the normal distribution in the general population, why would you alter it in a smaller sample when it is actually correctly representing the distribution in the general population?

  4. #4
    Cookie Scientist
    Points: 5,936, Level: 49
    Level completed: 93%, Points required for next Level: 14
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    796
    Thanks
    17
    Thanked 315 Times in 241 Posts

    Re: Should you transform skewed data when the distribution is expected?

    Because a lot of our most common statistical techniques depend on assumptions about the error term being normally distributed, and violation of this assumption can result in biased confidence intervals. Transforming the raw variables to be more normal can help this because it may result in the errors also being more normal.
    http://en.wikipedia.org/wiki/Ordinar...ming_normality
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  5. #5
    IBM Rules
    Points: 12,853, Level: 74
    Level completed: 1%, Points required for next Level: 397

    Posts
    2,501
    Thanks
    110
    Thanked 368 Times in 356 Posts

    Re: Should you transform skewed data when the distribution is expected?

    My answer would be that methods like regression assume non-skewed data and that the results will not be accurate if your data is skewed. One example of how this works is that outlier test (such as Tukey's boxplot) that assumes normal data will generate incorrect number of outliers if the data is highly skewed.

    As far as I know the fact that the data really reflects a skewed population (as data often does) has absolutely nothing to do with this. It has to do with the assumptions the method is making in doing its analysis.
    "Facts are stubborn things, but statistics are more pliable." Mark Twain

  6. #6
    Points: 149, Level: 2
    Level completed: 98%, Points required for next Level: 1

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Should you transform skewed data when the distribution is expected?

    I received the following response to my original question from an old stats professor of mine"

    "Regarding transformations, it depends upon what analyses you plan to carry out on the variables. For example, when doing ANOVA with large (greater than 40 per group) sample sizes, there's no need to transform a skewed DV, as the Central Limit Theorem ensures normality of the mean values. Similarly, for regression, the assumption is that the *residuals* are normally distributed, not necessarily the variables. This may result in an analysis where a number of variables are skewed, but there's no need for transformation because the residuals have a normal distribution. Really, transformation is so dependent upon the particulars of your situation, that it's difficult to formulate generalizations."

    While I appreciate his response, it doesn't necessarily get at my original question.

  7. #7
    IBM Rules
    Points: 12,853, Level: 74
    Level completed: 1%, Points required for next Level: 397

    Posts
    2,501
    Thanks
    110
    Thanked 368 Times in 356 Posts

    Re: Should you transform skewed data when the distribution is expected?

    It is also a response other statisticians disagree with. One Stanford prof wrote a book suggesting outliers could totally invalidate ANOVA despite the CLM even with very high sample sizes (although this is somewhat different than skewness per se). And testing for skewness in the regression data, not in the residuals, is commonly recommended in works on statistics and statistics classes. I posted a while back to well known statistical experts who argued that normality was critical to regression, not simply normality in the residuals.
    "Facts are stubborn things, but statistics are more pliable." Mark Twain

  8. #8
    Points: 51, Level: 1
    Level completed: 2%, Points required for next Level: 49

    Location
    Manchester, UK
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Should you transform skewed data when the distribution is expected?


    Sorry to open up this thread again; I have this exact problem. Is non-parametric the answer?

+ Reply to Thread

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats