+ Reply to Thread
Results 1 to 4 of 4

Thread: Standardizing non-normally distributed data?

  1. #1
    Points: 1,457, Level: 21
    Level completed: 57%, Points required for next Level: 43

    Posts
    34
    Thanks
    7
    Thanked 3 Times in 3 Posts

    Standardizing non-normally distributed data?




    Hi everyone,

    I'm working on a project at the moment that aims to compare subjects based on different characteristics. At the moment, I am looking for a way to standardize the scores for each of the characteristics.

    I'm still pretty new to data analysis so please accept my apologies if these are very basic questions. My main problem is that the scores for each characteristic are not normally distributed and the distribution shapes vary between characteristics. I'll try to explain below:

    I have 100 subjects in my study and I am measuring five characteristics for each of them. Let's say Characteristic A has a min of 40 and a max of 100, with a median of 87 and a mean of 85. Characteristic B has a min of 0.00 and a max of 0.07 with a median of 0.03 and a mean of 0.01. Then there are three other characteristics also with different distributions.

    Ideally, what I am looking to do is get them all on the same scale, e.g. with scores between 0 and 100 to make it easy to compare. At the end, I want to compute a total score by weighing the individual characteristics. For that to make sense, I feel that they should initially be on the same scale.

    My first instinct was to use z scores in order to keep the variability in each distribution but still get them on the same scale. However, the data is not normally distributed. E.g., Characteristic B is skewed to the left, while Characteristic A is heavily skewed to the right (looks approx. like exponential decrease).

    So I am wondering if I can still use z scores for this exercise, or if there is a better method out there to perform standardization for functions with different shapes and non-normal distribution.

    Apologies if I am using any of the terminology wrong. By standardization I really just mean that I want to have each characteristics on the same scale, e.g. 0 to 100 (or any other scale), that conserves the differences between subjects as in the original distributions but makes the characteristics more comparable in magnitude.

    I hope this made sense and any help will be much appreciated!

    Thanks a lot,
    Londoner

  2. #2
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: Standardizing non-normally distributed data?

    You could rank the individuals on reach variable and take
    each individual's median rank across all variables, for
    comparison between subjects.

    Just my 2pence

    K.

  3. The Following User Says Thank You to Karabiner For This Useful Post:

    Londoner (08-13-2014)

  4. #3
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Standardizing non-normally distributed data?

    The transformation: 100 * (x - min)/(max-min) will put the data in the range 0-100 (where min is the smallest data value and max is the largest). It's basically just re scaling the data so the min is 0 and the max is 100.
    I don't have emotions and sometimes that makes me very sad.

  5. The Following User Says Thank You to Dason For This Useful Post:

    Londoner (08-13-2014)

  6. #4
    Points: 1,457, Level: 21
    Level completed: 57%, Points required for next Level: 43

    Posts
    34
    Thanks
    7
    Thanked 3 Times in 3 Posts

    Re: Standardizing non-normally distributed data?


    Thanks to both of you!

    @Karabiner: That is actually what I have done for now as a quick solution to see what it looks like. The only concern I had with that is if for example the top 3 scores for a characteristics are 100, 90, 89, then the ranking (1,2,3) loses the "relative distance" between the three subjects. Good to know I wasn't completely off the mark though

    @Dason: Thanks for that suggestion, that sounds exactly like what I had in mind but couldn't quite express. I will implement that as a second option to see what it looks like.

    I love this community

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats