dendrogram graph - how many clusters would you pick in this with reasons why?

#1
Hi having a little problem with this question which i need to learn for my test coming up.. i have produced a dendrogram (on sas enterprise) which consists of 77 countrys with various information.

I need to discuss the graph and explain how many clusters i would pick and why? (its more focusing on the graphical outputs)

I will attach the graph to this thread and if anyone could help me it would be very good.

Many thanks
J
 

Miner

TS Contributor
#2
There is no exact science behind it. There are usually several ways that it could logically be cut, and sometimes you choose one over another for practical reasons.

My personal choice is to create a scatter plot with (connected dots) with # of clusters on the x-axis and Similarity level on the y-axis. You should see a plot with a curve that ramps up quickly, then plateaus. There will also be several points where there are significant jumps (discontinuities) rather than a smooth curve.

In the attached example you could select 5 clusters as a balance of minimizing the number of clusters while maximizing the similarity. However, there might be a legitimate reason for selecting 7 or 9 clusters. For example, these clusters might be market segments. The 5 clusters would represent the mass market while the 7 or 9 clusters would include market niches.
 
#3
There is no exact science behind it. There are usually several ways that it could logically be cut, and sometimes you choose one over another for practical reasons.

My personal choice is to create a scatter plot with (connected dots) with # of clusters on the x-axis and Similarity level on the y-axis. You should see a plot with a curve that ramps up quickly, then plateaus. There will also be several points where there are significant jumps (discontinuities) rather than a smooth curve.

In the attached example you could select 5 clusters as a balance of minimizing the number of clusters while maximizing the similarity. However, there might be a legitimate reason for selecting 7 or 9 clusters. For example, these clusters might be market segments. The 5 clusters would represent the mass market while the 7 or 9 clusters would include market niches.
Hi thanks for that.. i am just trying to learn it because of a exam i have soon. The example i put up is actually part of the test so i want to know as much as i can about it.

Thanks for the example.. if you analysed my example what would you say.. this is just in regards to me revising as i am still a little confused.

If you don't feel like you want to don't worry :)
 

Miner

TS Contributor
#4
Slicing a dendrogram is a little tougher. I usually incorporate the reason for cluster into the decision whether to make more or fewer clusters. Just eyeballing the dendrogram, I might try 10 clusters or even 13, but that's a SWAG.
 
#5
Slicing a dendrogram is a little tougher. I usually incorporate the reason for cluster into the decision whether to make more or fewer clusters. Just eyeballing the dendrogram, I might try 10 clusters or even 13, but that's a SWAG.
oh is it.. 10 or 13.. what would be your reason for this if you had to give one?.. I'm just trying to get my head around the diagram and understand how you pick the clusters.
 

Miner

TS Contributor
#6
It is difficult to say. As I said, part of the decision is based on what you are trying to accomplish with the clusters. You haven't said, so critical decision information is missing. In the absence of that, I am making a judgement based on looking for the natural gaps, and balance the number of clusters with the similarity/difference level. There is no specific rule for determining the number of clusters.
 
#7
It is difficult to say. As I said, part of the decision is based on what you are trying to accomplish with the clusters. You haven't said, so critical decision information is missing. In the absence of that, I am making a judgement based on looking for the natural gaps, and balance the number of clusters with the similarity/difference level. There is no specific rule for determining the number of clusters.
The bank wishes to see if different districts have similar profiles and have therefore asked that the district data be clustered and put into this graph?

thats what they are looking for...
 

Miner

TS Contributor
#8
So the bank will want to minimize the number of clusters so that they can treat as many districts the same as possible? What are the implications if they choose too few clusters?
 
#9
So the bank will want to minimize the number of clusters so that they can treat as many districts the same as possible? What are the implications if they choose too few clusters?
"Fully discuss the dendrogram and explain how many clusters you would pick. Fully explain why."

is the questions.. its asking how many clusters you would pick? using wards method..
The bank wishes to see if different districts have similar profiles.. so looking to see if they are the same or different.
 

gianmarco

TS Contributor
#10
Hi!
I think that Miner has already replied to your main issue.
Cutting the tree diagram, i.e. decide how many clusters there actually are, is a difficult task, and different approaches exist. They attempt to provide not THE best number of cluster, but AN optimal one.
I quote some of the methods (with references) in this web page: http://cainarchaeology.weebly.com/extension-clustering-rows-andor-columns.html. You could take a look at that.

Morover, a useful guide on clustering (with the explanation of the mechanics of how to "cut" the tree) can be found in THIS and THIS videos.

I think that many on-line resources on the topic could be found on the web.

In general, the various methods to find AN optimal number of clusters seek to find a balance between the "within groups" variability (which has to be minimized) and the "between groups" variability (which has to be maximized).


Hope this helps,
regards
Gm