Interpreting a hierarchical cluster analysis using dendrogram

#1
Can anyone help me to figure out how to find the number of clusters I have in a hierarchical cluster analysis, which I'm doing in SPSS please?

I heard that Ward's method was the best to use to begin looking at how many clusters there are in a data set. I have around 200 participants and am looking at clustering 12 variables (continuous). The aim is to find out how many clusters I have, if there are any at all?! Ideally, if there are a few, I'd quite like to run a regression to look at whether these clusters can be used to predict QoL. At the moment, I've been stuck for several days on the first step, interpreting the output from the ward's analysis. :(

I conducted the analysis through a few handouts and youtube videos I found but I can't seem to get the hang of interpreting the results, particularly as the data set is so large and the dendrogram seems to show about 30 clusters. I've got the results of the dendrogram below and I can include the agglomeration schedule if anyone think it might help?

Thank you in advance!

1515191577553.png
 
#2
The dendrogram allows you to see the clusters for any prespecified number of clusters. However, identifying the optimal number of clusters and the optimal clustering method (Ward vs something else) is a more complex task. One approach is to think what will you be using the clusters for ultimately? Once you get a good grasp of that, you can backtest the optimal specifications out of the data using model selection methods like cross-validation.

It is possible to use the dendrogram to get an idea of the number of clusters range. You check how quickly the intra-cluster distances decrease with each new split. However, such calculations are quite preliminary and imprecise. In my opinion, they are good for building researcher's intuition but still have to be verified formally.
 
#3
Thanks for your help! Previous research suggests about 2 or 3 is the average number of clusters and 2 clusters seems to make sense in my research. Is there a way to check the quality of clusters? Is this the cross-validation method you outlined, and can it be achieved in SPSS?

Thanks again.
 

Miner

TS Contributor
#4
Each clustering method has its own set of strengths, weaknesses, and tendencies. Wards method tends to form clusters with similar numbers in each cluster. This may or may not make sense in your particular situation. Regarding the number of clusters, this is a judgment call. You can visually cut the dendrogram. Or a method that I use is to plot a graph of similarity level versus the number of clusters. You will probably see multiple break points to evaluate. I usually find that more than one make sense, and settle for the one that is most applicable to my particular question. TStats.jpg