Next: Some Objective Functions Up: Bottom Up Approach Previous: Agglomerative Clustering

Hierarchical Clustering

Generally speaking, the agglomerative methods are not very efficient. Finding the best pair of candidates for merging may require tex2html_wrap_inline5261 time, unless a list of the cluster merging costs is stored and updated ( which will likely require tex2html_wrap_inline5263 space ). So an alternative strategy is to find many good clusters to merge and then perform all the merges simultaneously. This is known as a hierarchical strategy for the sake of clustering. In general it is more efficient than the agglomerative kind of clustering approach.

Fig gif shows an example of both hierarchical and agglomerative clustering for 8 vertices.

   figure2126
Figure: An example of Hierarchical and Agglomerative Clustering

The various hierarchical schemes that were studied are described as below :

  1. Scheme 1

    This scheme is due to Bui[15]. It is known as Matching Based Compaction. In this paper, the authors have reported a scheme for the graph bisection problem with min-cut formulation. The scheme is based on an observation that the K-L method as well as Simulated Annealing method ( applied to the graph bisection problem ) work much better for graphs with high average degree ( tex2html_wrap_inline5265 ). So the scheme tries to improve the average degree of the graph by collapsing vertices of the graph together. The scheme is as follows:

    1. Given an input graph G, find out a maximum random matching M of the graph. A matching is a set of disjoint pairs of vertices.
    2. Once such a matching is found, then for each edge tex2html_wrap_inline5267 coalesce the two endpoints of V belonging to e together to form a new vertex. Thus a contracted graph G' is constructed.
    3. After that the bisection heuristics is run on the contracted graph to obtain the bisection.
    4. After such bisection (A',B') is obtained, the edges are uncompacted from the original graph and an initial bisection (A,B) is created for the original graph.
    5. Finally this original bisection is used as a starting configuration for the bisection procedure on the original graph.

    The method described by the authors to collapse the nodes will cause the average degree of the graph G' to be larger than the average degree of G. Thus if the bisection algorithm finds a good bisection of this contracted graph G', then hopefully the corresponding initial bisection of G will close to an optimal bisection of G. The authors have reported good results for graphs with low average degrees.

    Comments :

    1. The scheme seems to be powerful for tackling the case of partitioning problem for graph with smaller average degrees. The K&L method seems to work much better for graphs with higher average degrees. So the authors first randomly collapse the graph to generate a graph with higher average degree. After this they apply the K&L method to get a good bisection for the contracted graph. As the average degree for the contacted graph is higher than that of the original graph, the quality of the bisection thus obtained is much better. Also K&L method depends a lot on the initial solution. Thus the authors hope that an optimal bisection of the contracted graph followed by uncollpasing of the vertices in the contracted graph would provide an almost optimal solution for the K&L method for bisecting the original graph.
    2. However the scheme tries to find a random matching. It does not take into account additional information available. One of such information could be the edge weight between two vertices. Higher edge weight represents more correlation between the vertices, making them preferable candidates for the sake of merging. Finding a random matching and then merging the nodes may result in loss of such vital information.
    3. If the scheme could be enhanced to capture the above mentioned information, it could be very well suited for the sake of a multilevel formulation.
  2. Scheme 2

    This scheme is due to Feo et. al[16]. It is based on Maximal weight matching of a graph. In this paper, the authors have reported a scheme for partitioning the nodes of a weighted graph into k disjoint subsets of bounded size, such that the sum of the weights of the edges whose end vertices belong to the same subset is maximized.

    The scheme is as follows :

    1. Without the loss of generality assume |V| = k * m ( extra isolated vertices could be added if necessary ). Also assume that the graph is complete with zero edge weights added to ensure this condition.
    2. Once these conditions are ensured, find a maximum weight perfect matching M for G.
    3. Once this matching M is found,
      do i= 1,k
      1. Choose arbitrarily a set, tex2html_wrap_inline5299 , of m/2 nonselected edges of M.
      2. Mark the edges in tex2html_wrap_inline5299 as selected.
      3. Place in cluster i, the nodes incident to the edges in tex2html_wrap_inline5299 .

    A similar scheme is reported by Dai and Kuh[17]. Their method is based on clustering together those vertices that are highly interconnected. A complete graph is defined as above. In the first iteration of the scheme, the nodes of the graph are clustered into groups of 4-5 nodes, such that the sum of the weights of the edges in all clusters is maximized. In subsequent iterations the clusters themselves are combined into groups of 4-5. The iterations continue until there is only one cluster, thereby producing a hierarchical cluster tree.

    The approach taken by Roy and Sechen in [18] is also similar except that only those edges whose weight is higher than a given threshold are contracted. This approach differs from the matching based approaches in a sense that the pairs if the vertices to be collapsed together are selected more selectively.

    Comments :

    1. This scheme seems to be better than the previous one. It does take into account the edge weights, which represents the correlation between the vertices.
    2. However this method seems good only for the initial phase. If we try to apply a multilevel formulation using this maximal weight matching, then the scheme will work well during the earlier levels. However for the later levels, it will prefer the heavier nodes over the others for the sake of merging. This happens because as heavier node collapse together to form a single node, then all the edges incident to that node are going to be considerably heavy, making them preferable candidates for the sake of being considered for being in the Maximum weight perfect matching.
    3. Thus it will not be a good idea to have a multilevel formulation in this case. The matching criterion should not be based solely on the weights between the edges. One should consider the sizes of the vertices being connected by the edge as well, as the edge weight is also affected by the sizes of the clusters it joins in a multilevel formulation.
    4. The schemes developed in the report also try to find a matching ( not a maximal one ) based on satisfying the average connectivity criterion i.e., if tex2html_wrap_inline5053 is the clustering solution then tex2html_wrap_inline5313
  3. Scheme 3

    This scheme is due to Hauck And Borriello[19]. These authors use randomness and the merging criterion of Shin and Kim as described in [12]. To begin with, the scheme places all the vertices in their own clusters i.e. the clustering solution is tex2html_wrap_inline5081 . The clusters are then visited in a random order and each cluster is associated with a neighboring cluster, with which it has the highest connectivity. Thus a cluster that is connected to a neighboring cluster with N edges is N times likely to be clustered with that neighbor. A node that was previously the target of clustering is not used as a source of a clustering, but an unclustered node can choose to join a grouping with a node already clustered. Once a clustering is generated in the above mentioned fashion, it is fed as an initial solution ( after unclustering the nodes ) to any F&M kind of scheme to generate the final partitioning solution.

    Comments:

    1. As K&L or F&M methods work better with the graphs with higher average degrees, this scheme is useful for partitioning such kind of graphs. The vertices are visited randomly and are collapsed with the neighbors with which they have the highest connectivity. Thus the resulting graph has a higher average degree than the original graph. This contracted graph is then partitioned using the partitioning heuristics. Being of higher average degree it yields a better solution. This particular clustering solution is used as an initial solution for the partitioning heuristics after unclustering the nodes. As most of the partitioning heuristics depend largely on the goodness of an initial solution, this scheme produces better results than application of the partitioning heuristics to the original graph alone.
    2. Multilevel formulation will not be much useful if the criteria for collapsing is just to merge a node with its most highly connected neighbor. This scheme will prefer matching everything with the larger nodes as the edges incident on the larger nodes will be heavier, thereby gaining preference over the others for being collapsed.

   figure2156
Figure: Flow of a 2-Phase F&M based Partitioning Scheme

Most of the schemes described above are motivated from two observations :

  1. The overall quality of the partitioning usually depends on the quality of initial solution. Better the initial solution, better is the quality of the final partitioning generated by the partitioning heuristics.
  2. The most popular partitioning schemes like F&M seem to perform much better for graphs with higher average degrees ( tex2html_wrap_inline5321 The quality of the solution generated is much better if the average degree of the graph is high enough.
Thus as shown in Fig gif the flow of a partitioning scheme is as follows:
  1. Generate a clustering for the whole graph.
  2. Collapse the items within a cluster to form a single node. Thus the original graph is now contracted. As the graph is contracted, the average degree of vertices in this contracted graph is higher than that of the vertices in the original graph. Also the size of the contracted graph is comparatively smaller.
  3. Now run the partitioning heuristics on the contracted graph to generate a partitioning solution. This solution will be of much better quality because of the above mentioned points.
  4. After this, uncoarsen the vertices in the cluster to obtain the initial partitioning solution.
  5. After this apply the partitioning heuristics to the original graph with the original solution as above.

Thus this allows the scheme to obtain a better partitioning of the graph. This acts as a good optimization to the F&M scheme. The ways in which clustering improve the quality are twofold. First of all, F&M algorithm is a global algorithm, optimizing for the macroscopic properties of the graph. However it may overlook more local, microscopic concerns. So an intelligent clustering algorithm will often focus on local information, grouping together a few nodes based on local properties. Thus a smart clustering algorithm can perform good local optimization, complementing the global optimization properties of the F&M algorithm. Secondly, the F&M algorithm, as mentioned earlier, works better when the average degree if the input graph is high. Clustering should in general increase the average degree of the graph. Thus this results in an overall better partition.

Next: Some Objective Functions Up: Bottom Up Approach Previous: Agglomerative Clustering

Sushrut S Karanjkar
Tue Apr 21 17:00:32 CDT 1998

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.