Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.ipm.2006.07.022
Title: Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering
Authors: Niu, Z.-Y.
Ji, D.-H.
Tan, C.L. 
Keywords: Cluster number estimation
Cluster validation
Document clustering
Feature selection
Issue Date: 2007
Source: Niu, Z.-Y.,Ji, D.-H.,Tan, C.L. (2007). Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering. Information Processing and Management 43 (3) : 730-739. ScholarBank@NUS Repository. https://doi.org/10.1016/j.ipm.2006.07.022
Abstract: This paper presents a cluster validation based document clustering algorithm, which is capable of identifying an important feature subset and the intrinsic value of model order (cluster number). The important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and the cluster number which maximize the cluster validity criterion are chosen as our answer. We have evaluated our algorithm using several datasets from the 20Newsgroup corpus. Experimental results show that our algorithm can find the important feature subset, estimate the cluster number and achieve higher micro-averaged precision than previous document clustering algorithms which require the value of cluster number to be provided. © 2006 Elsevier Ltd. All rights reserved.
Source Title: Information Processing and Management
URI: http://scholarbank.nus.edu.sg/handle/10635/39468
ISSN: 03064573
DOI: 10.1016/j.ipm.2006.07.022
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

5
checked on Dec 5, 2017

WEB OF SCIENCETM
Citations

3
checked on Nov 1, 2017

Page view(s)

46
checked on Dec 9, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.