## Cluster Analysis

I am in the design stage of a project comparing emotion clustering in Polish and English people. We are going to run two experiments, one which will ask Polish participants to sort about 150 Polish emotions into groups and one which will ask English participants to sort the same, translated 150 English emotions into groups. The participants will sort the emotions into groups on the basis of how similar they are to each other. The participants will be allowed to sort the emotions into as many groups as they want. Once the emotions are in groups they will rate the extent to which they think each emotion belongs in a group on a 1 to 9 point scale. They will perform this task on a computer.

My question concerns how many times we allow each emotion to be used in the sorting/grouping procedure. There are two possibilities:

1. Each emotion can only be used once in the sorting/grouping procedure. For each participant, a 150 X 150 co-occurrence matrix will be constructed, with 1 indicating that two terms were placed in the same category and 0 indicating that they were not. These matrices will be added across all of the Polish and all of the English subjects separately to form two 150 X 150 matrices in which cell entries could range from 0 to 100, representing the number of subjects (we aim for 100 subjects) who placed a particular pair of words in the same category. These two matrices will be analysed separately with a cluster analysis program.

2. Subjects are free to use each emotion as many times they want in the sorting/grouping procedure. The reason for this is that it might provide more descriptive information. However, it is not easy to decide on what data to enter into the cluster analyses. There are two options:

a). Identify the highest rated instance of each emotion in the sortings/groups. This instance of the emotion would then be judged to be a member of that group and we would proceed to construct the 150 X 150 matrices as in “1.” above. The problem is that there are likely to be ties in the ratings of different instances of the same emotions. Also, I am not sure whether using the highest rated instance of an emotion is the same as asking subjects to place an emotion in only one group. I feel that this is a crucial point and might invalidate the cluster analysis.

b). When constructing, the 150 X 150 matrices, instead of allocating a “1” to indicate that two emotion terms were placed in the same category/group, we could allocate a number that is based on how many times two emotion terms were placed in the category/group by each subject. A possible problem with this is that there might be quite wide variation in the data. Some emotions might be placed with other emotions a great deal and others hardly at all. Also, there might be a great deal of between subject variability with some subjects placing the same emotion in different categories/groups and others doing so rarely.

