__Making a Dendrogram__

Hierchical cluster analysis (as we've been doing here) can be portrayed graphically by a dendrogram, which represents the clustering process in a tree-like graph.

One axis will (usually) represent an agglomeration coefficient. This depends on the clustering algorithm used, but is usually the distance between clusters joined at each stage. Along the other axis individual cases will be plotted giving a visualization of the relative size of each of the clusters.

Here's the dendrogram created when clustering the data using Ward's Method (squared Euclidean distance, variables normalized using z-scores)

Stage | Distance Btw Cluster Ctrs. | Total SSE At Each Stage |

1 | 0.5576 | 0.278802 |

2 | 0.91788 | 0.737743 |

3 | 1.09913 | 1.287309 |

4 | 1.14743 | 1.861023 |

5 | 1.15104 | 2.436542 |

6 | 1.23339 | 3.053236 |

7 | 1.56906 | 3.837767 |

8 | 1.73841 | 4.70697 |

9 | 2.06592 | 5.739929 |

10 | 2.12478 | 6.802316 |

11 | 2.52013 | 8.06238 |

12 | 2.96935 | 9.547052 |

13 | 3.08863 | 11.09137 |

14 | 3.56697 | 12.87485 |

15 | 4.14933 | 14.94952 |

16 | 5.39885 | 17.64894 |

17 | 5.4883 | 20.39309 |

18 | 5.5328 | 23.15949 |

19 | 6.30109 | 26.31003 |

20 | 6.38562 | 29.50284 |

21 | 7.20355 | 33.10462 |

22 | 7.31179 | 36.76051 |

23 | 9.05223 | 41.28663 |

24 | 18.0712 | 50.32223 |

25 | 18.90347 | 59.77396 |

26 | 23.43379 | 71.49086 |

27 | 38.75994 | 90.87082 |

28 | 101.16 | 141.4508 |

29 | 123.0984 | 203 |

Along the
horizontal axis of this graph is the distance between cluster centers (centroids),
I'm not quite sure why for Ward's method this distance is used rather than SSE_{total},
but it is the same as the increase in SSE at each stage of clustering multiplied
by 2. For instance at the last stage (stage 29) the coefficient (or total SSE)
is 203.000. The previous stage is 141.451.

Meaning the increase in total SSE is:

SSE_{29} -
SSE_{28} = ΔSSE_{28-29}

203.000 - 141.451 = 61.549

This has to be multiplied by 2, for reasons that are explained here.

61.549 * 2 = 123.098

Which as you'll see is where stage 29 is plotted on the horizontal axis.

The next step in this is to determine the number of clusters you want to work with.

How many clusters are there?