You could rank the individuals on reach variable and take
each individual's median rank across all variables, for
comparison between subjects.
Just my 2pence
K.
Hi everyone,
I'm working on a project at the moment that aims to compare subjects based on different characteristics. At the moment, I am looking for a way to standardize the scores for each of the characteristics.
I'm still pretty new to data analysis so please accept my apologies if these are very basic questions. My main problem is that the scores for each characteristic are not normally distributed and the distribution shapes vary between characteristics. I'll try to explain below:
I have 100 subjects in my study and I am measuring five characteristics for each of them. Let's say Characteristic A has a min of 40 and a max of 100, with a median of 87 and a mean of 85. Characteristic B has a min of 0.00 and a max of 0.07 with a median of 0.03 and a mean of 0.01. Then there are three other characteristics also with different distributions.
Ideally, what I am looking to do is get them all on the same scale, e.g. with scores between 0 and 100 to make it easy to compare. At the end, I want to compute a total score by weighing the individual characteristics. For that to make sense, I feel that they should initially be on the same scale.
My first instinct was to use z scores in order to keep the variability in each distribution but still get them on the same scale. However, the data is not normally distributed. E.g., Characteristic B is skewed to the left, while Characteristic A is heavily skewed to the right (looks approx. like exponential decrease).
So I am wondering if I can still use z scores for this exercise, or if there is a better method out there to perform standardization for functions with different shapes and non-normal distribution.
Apologies if I am using any of the terminology wrong. By standardization I really just mean that I want to have each characteristics on the same scale, e.g. 0 to 100 (or any other scale), that conserves the differences between subjects as in the original distributions but makes the characteristics more comparable in magnitude.
I hope this made sense and any help will be much appreciated!
Thanks a lot,
Londoner
You could rank the individuals on reach variable and take
each individual's median rank across all variables, for
comparison between subjects.
Just my 2pence
K.
Londoner (08-13-2014)
The transformation: 100 * (x - min)/(max-min) will put the data in the range 0-100 (where min is the smallest data value and max is the largest). It's basically just re scaling the data so the min is 0 and the max is 100.
I don't have emotions and sometimes that makes me very sad.
Londoner (08-13-2014)
Thanks to both of you!
@Karabiner: That is actually what I have done for now as a quick solution to see what it looks like. The only concern I had with that is if for example the top 3 scores for a characteristics are 100, 90, 89, then the ranking (1,2,3) loses the "relative distance" between the three subjects. Good to know I wasn't completely off the mark though
@Dason: Thanks for that suggestion, that sounds exactly like what I had in mind but couldn't quite express. I will implement that as a second option to see what it looks like.
I love this community
Tweet |