Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.ins.2014.02.062
Title: A general framework of hierarchical clustering and its applications
Authors: Cai, R.
Zhang, Z.
Tung, A.K.H. 
Dai, C.
Hao, Z.
Keywords: Clustering
Hierarchical
k-Means
k-Median
Streaming algorithm
Issue Date: 7-Oct-2014
Citation: Cai, R., Zhang, Z., Tung, A.K.H., Dai, C., Hao, Z. (2014-10-07). A general framework of hierarchical clustering and its applications. Information Sciences 272 : 29-48. ScholarBank@NUS Repository. https://doi.org/10.1016/j.ins.2014.02.062
Abstract: Hierarchical clustering problem is a traditional topic in computer science, which aims to discover a consistent hierarchy of clusters with different granularities. One of the most important open questions on hierarchical clustering is the identification of the meaningful clustering levels in the hierarchical structure. In this paper, we answer this question from algorithmic point of view. In particular, we derive a quantitative analysis on the impact of the low-level clustering costs on high level clusters, when agglomerative algorithms are run to construct the hierarchy. This analysis enables us to find meaningful clustering levels, which are independent of the clusters hierarchically beneath it. We thus propose a general agglomerative hierarchical clustering framework, which automatically constructs meaningful clustering levels. This framework is proven to be generally applicable to any k-clustering problem in any α-relaxed metric space, in which strict triangle inequality is relaxed within some constant factor α. To fully utilize the hierarchical clustering framework, we conduct some case studies on k-median and k-means clustering problems, in both of which our framework achieves better approximation factor than the state-of-the-art methods. We also extend our framework to handle the data stream clustering problem, which allows only one scan on the whole data set. By incorporating our framework into Guha's data stream clustering algorithm, the clustering quality is greatly enhanced with only small extra computation cost incurred. The extensive experiments show that our proposal is superior to the distance based agglomerative hierarchical clustering and data stream clustering algorithms on a variety of data sets. © 2014 Elsevier Inc. All rights reserved.
Source Title: Information Sciences
URI: http://scholarbank.nus.edu.sg/handle/10635/77800
ISSN: 00200255
DOI: 10.1016/j.ins.2014.02.062
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.