Efficient mining of distance-based subspace clusters

Please use this identifier to cite or link to this item: https://doi.org/10.1002/sam.10062

DC Field	Value
dc.title	Efficient mining of distance-based subspace clusters
dc.contributor.author	Liu, G.
dc.contributor.author	Sim, K.
dc.contributor.author	Li, J.
dc.contributor.author	Wong, L.
dc.date.accessioned	2013-07-04T07:47:05Z
dc.date.available	2013-07-04T07:47:05Z
dc.date.issued	2009
dc.identifier.citation	Liu, G.,Sim, K.,Li, J.,Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining 2 (5-6) : 427-444. ScholarBank@NUS Repository. <a href="https://doi.org/10.1002/sam.10062" target="_blank">https://doi.org/10.1002/sam.10062</a>
dc.identifier.issn	19321872
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/39679
dc.description.abstract	Traditional similarity measurements often become meaningless when dimensions of datasets increase. Subspace clustering has been proposed to find clusters embedded in subspaces of high-dimensional datasets. Many existing algorithms use a grid-based approach to partition the data space into nonoverlapping rectangle cells, and then identify connected dense cells as clusters. The rigid boundaries of the grid-based approach may cause a real cluster to be divided into several small clusters. In this paper, we propose to use a sliding-window approach to partition the dimensions to preserve significant clusters. We call this model nCluster model. The sliding-window approach generates more bins than the grid-based approach, thus it incurs higher mining cost. We develop a deterministic algorithm, called MaxnCluster, to mine nClusters efficiently. MaxnCluster uses several techniques to speed up the mining, and it produces only maximal nClusters to reduce result size. Non-maximal nClusters are pruned without the need of storing the discovered nClusters in the memory, which is key to the efficiency of MaxnCluster. Our experiment results show that (i) the nCluster model can indeed preserve clusters that are shattered by the grid-based approach on synthetic datasets; (ii) the nCluster model produces more significant clusters than the grid-based approach on two real gene expression datasets and (iii) MaxnCluster is efficient in mining maximal nClusters. © 2009 Wiley Periodicals, Inc.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1002/sam.10062
dc.source	Scopus
dc.subject	Biclustering
dc.subject	Distance-based clustering
dc.subject	Subspace clustering
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1002/sam.10062
dc.description.sourcetitle	Statistical Analysis and Data Mining
dc.description.volume	2
dc.description.issue	5-6
dc.description.page	427-444
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM