Measure of overlap between two sets of ratings (Mahalanobis?)

#1
Hi all,

I would like to compute a measure of overlap between two sets of ratings made by the same individual. To illustrate, let’s assume the following:

A sample of individuals (N = 20) are asked to rate their own personality on nine different dimensions (e.g., emotional stability, extraversion, self-confidence). Then they are asked to do the exact same ratings of a person they have just met. Hence, for each individual we have one set of nine ratings pertaining to themselves (self1, self2, self3, etc.) and one set of nine ratings pertaining to the other person (other1, other2, other3, etc.). The other person is the same for all individuals in the sample.

Now, for each individual I would like to compute a measure of similarity (or dissimilarity) representing the degree of overlap between the self and the other person (in terms of the rated variables). My first idea was to simply compute the Euclidian distance:

View attachment 4880

In my understanding, however, Euclidian distance is misleading when two or more of the variables are correlated, which in fact they are here (e.g., extraversion and self-confidence). If that is the case, the recommendation seems to be to compute Mahalanobis distance instead, which essentially “decorrelates” the variables before computing the distance:



My problem is which covariance matrix (S-1) to use in the computation of Mahalanobis distance. Because the two sets of ratings (self and other) are two different distributions, I end up with two covariance matrices. Can I somehow compute a pooled covariance matrix and enter that in the formula? Or is Mahalanobis distance simply not applicable when the two sets of ratings come from different distributions?

If this is doable, then any suggestions how to implement this in statistical software (SPSS, R, MatLab) are most welcome.

Best,

Kalle

Ps. This is my first post, so I apologize for any rookie mistakes.
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
So you have 20 subjects that rate themself and then person_comparison, and all 20 people do this. Your issues is some of the questions are related, so summing difference and dividing may be an issue. Can you see that the questions are related in your data, this is not just a presumption? What are the nine areas in particular?

This seems like it may be a factor analysis type of question perhaps, which I have limited knowledge on. Or you just need to employ some weighting. How do people typically use these data in your area, or is this something you created?
 
#3
So you have 20 subjects that rate themself and then person_comparison, and all 20 people do this. Your issues is some of the questions are related, so summing difference and dividing may be an issue. Can you see that the questions are related in your data, this is not just a presumption? What are the nine areas in particular?

This seems like it may be a factor analysis type of question perhaps, which I have limited knowledge on. Or you just need to employ some weighting. How do people typically use these data in your area, or is this something you created?
Thanks for your response!

That's correct. I know some of the questions are related (as indicated by correlation coefficients).

Your weighting suggestion makes sense. The computation of Mahalanobis distance accomplishes a kind of weighting by taking into account the correlation between variables. So I was hoping to somehow make my data compatible with the Mahalanobis formula.

I think I may have found a solution to the problem, though: I first compute the absolute value of the self-target difference for each variable separately (i.e., self1-other1, self2-other2, etc.). Then I enter those difference scores into the computation of Mahalanobis distance. This distance measure tells me the size of the self-target overlap, relative to other individuals in the sample, taking into account the correlations between the different variables. :)

Again, thanks for your input!