Please use this identifier to cite or link to this item:
Title: Document clustering based on cluster validation
Authors: Niu, Z.-Y.
Ji, D.-H.
Tan, C.-L. 
Keywords: Cluster number estimation
Cluster validation
Document clustering
Feature selection
Issue Date: 2004
Citation: Niu, Z.-Y.,Ji, D.-H.,Tan, C.-L. (2004). Document clustering based on cluster validation. International Conference on Information and Knowledge Management, Proceedings : 501-506. ScholarBank@NUS Repository.
Abstract: This paper presents a cluster validation based document clustering algorithm, which is capable of identifying both important feature words and true model order (cluster number). Important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and cluster number which maximize the cluster validity criterion are chosen as our answer. We have applied our algorithm to several datasets from 20Newsgroup corpus. Experimental results show that our algorithm can find important feature subset, estimate the model order and yield higher micro-averaged precision than other four document clustering algorithms which require cluster number to be provided. Copyright 2004 ACM.
Source Title: International Conference on Information and Knowledge Management, Proceedings
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Jan 13, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.