Measuring data homogeneity

#1
Dear all,

I am trying to measure the specialization of workers contributing to different tasks of a common project. So, I am looking for a way to provide a quantitative measure of how specialized vs polyvalent every worker is.

I calculate the contribution of every worker to a given task as follows: 1/ (rank of the worker in the task/number of workers contributing to the task). For example, a worker with the largest contribution to a task would be ranked 1 and the one with the least contribution will have the largest rank (I have an objective way to calculate the contributions). I ended up having a table like the one below (just a sample, as I have hundreds of workers). So, Ideally, if a worker contributes significantly to one single task but does not contribute to any other task, this user should have the highest specialization score. On the other hand, if a user contribute equally to all the tasks he should have the lowest score.

This concept of specialization is close to the entropy, except that my worker's contributions are not probabilities, so I am not sure it can be used here. The other idea that came to my mind is standard deviation. In this case, the bigger the standard deviation the more specialized the worker is.

I am wondering if there is a better way to measure the specialization. Any feedback on my way of measuring the contribution is also welcome.

workers contributions.jpg

Many thanks in advance for your help.
 
Last edited:
#3
"if i told you i had found a better way to measure specialization, how would you know that you/I was right?"
Thank you for your answer/question. For me standard deviation looks good. Since I am planning to share the results of my work I don't want to find my self not using something that is commonly known in stats. The reviewers will say you should have used this or that .... I had a paper rejected for a reason like that, because I didn't study statistics in a systematic way I just don't know if there is a better statistical tool
 

gianmarco

TS Contributor
#4
Hello,

in my earlier field of study (archaeology/anthropology), researchers dealing with pottery production were "measuring" standardization of pottery shapes using the Coefficient of Variation of certain vessels' measurements. Your idea of using the Standard Deviation would seem ok at face value, but you have to also consider that it is difficult to compare the SD across variables since it depends on the unit of measure. So, using a dimensionless statistic such as the CV (sample SD divided by the sample mean) would perhaps be better.

Just my two cents.
Apologies if I am missing something but I am a bit tired after a long day

Gm