I need to discuss the graph and explain how many clusters i would pick and why? (its more focusing on the graphical outputs)

I will attach the graph to this thread and if anyone could help me it would be very good.

Many thanks

J

- Thread starter joshparker
- Start date
- Tags analysis analysis help data analysis graphs sas

I need to discuss the graph and explain how many clusters i would pick and why? (its more focusing on the graphical outputs)

I will attach the graph to this thread and if anyone could help me it would be very good.

Many thanks

J

My personal choice is to create a scatter plot with (connected dots) with # of clusters on the x-axis and Similarity level on the y-axis. You should see a plot with a curve that ramps up quickly, then plateaus. There will also be several points where there are significant jumps (discontinuities) rather than a smooth curve.

In the attached example you could select 5 clusters as a balance of minimizing the number of clusters while maximizing the similarity. However, there might be a legitimate reason for selecting 7 or 9 clusters. For example, these clusters might be market segments. The 5 clusters would represent the mass market while the 7 or 9 clusters would include market niches.

My personal choice is to create a scatter plot with (connected dots) with # of clusters on the x-axis and Similarity level on the y-axis. You should see a plot with a curve that ramps up quickly, then plateaus. There will also be several points where there are significant jumps (discontinuities) rather than a smooth curve.

In the attached example you could select 5 clusters as a balance of minimizing the number of clusters while maximizing the similarity. However, there might be a legitimate reason for selecting 7 or 9 clusters. For example, these clusters might be market segments. The 5 clusters would represent the mass market while the 7 or 9 clusters would include market niches.

Thanks for the example.. if you analysed my example what would you say.. this is just in regards to me revising as i am still a little confused.

If you don't feel like you want to don't worry

Slicing a dendrogram is a little tougher. I usually incorporate the reason for cluster into the decision whether to make more or fewer clusters. Just eyeballing the dendrogram, I might try 10 clusters or even 13, but that's a SWAG.

thats what they are looking for...

So the bank will want to minimize the number of clusters so that they can treat as many districts the same as possible? What are the implications if they choose too few clusters?

is the questions.. its asking how many clusters you would pick? using wards method..

The bank wishes to see if different districts have similar profiles.. so looking to see if they are the same or different.

I think that Miner has already replied to your main issue.

Cutting the tree diagram, i.e. decide how many clusters there actually are, is a difficult task, and different approaches exist. They attempt to provide not THE best number of cluster, but AN optimal one.

I quote some of the methods (with references) in this web page: http://cainarchaeology.weebly.com/extension-clustering-rows-andor-columns.html. You could take a look at that.

Morover, a useful guide on clustering (with the explanation of the mechanics of how to "cut" the tree) can be found in THIS and THIS videos.

I think that many on-line resources on the topic could be found on the web.

In general, the various methods to find AN optimal number of clusters seek to find a balance between the "within groups" variability (which has to be minimized) and the "between groups" variability (which has to be maximized).

Hope this helps,

regards

Gm