Euclidean Distance as measure of homogeneity for K-means Cluster analysis

#1
Hi,

Currently I'm performing K-means cluster analysis. To figure out how homogeneous the respondents within my clusters are, I calculate the Euclidean distance (otherwise known as the option "save -> distance from cluster centre - in SPSS) for each respondent. Because the euclidean distance is rooted, I square this so that the actual distances are given.

I notice that the more variables that you include into the cluster analysis, the higher the euclidean distance for the respondents becomes.

Can anybody tell me why this could be the case? I need to know how to average this out, so that I can make squared Euclidean distances comparable between a cluster solution formed with 10 variables, and a cluster solution formed with 100 variables.

The question is, whether I'm allowed to divide a euclidean distance formed by 100 variables, by 100 before I square it, so that the average squared euclidean distance for 1 variable is given. (or am I allowed to devide the euclidean distance by 100 after I square it? - or is this not the correct method of approach?)

Help would be greatly appreciated, thanks!
Pieter


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Here my data set for further clarification (if need be) ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
 
Last edited: