Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/39080
Title: The use of topic representativewords in text categorization
Authors: Kim, S.N.
Baldwin, T.
Kan, M.-Y. 
Keywords: Natural Language Techniques and Documents
Text categorization
Issue Date: 2009
Source: Kim, S.N.,Baldwin, T.,Kan, M.-Y. (2009). The use of topic representativewords in text categorization. ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium : 75-81. ScholarBank@NUS Repository.
Abstract: We present a novel way to identify the representative words that are able to capture the topic of documents for use in text categorization. Our intuition is that not all word n-grams equally represent the topic of a document, and thus using all of them can potentially dilute the feature space. Hence, our aim is to investigate methods for identifying good indexing words, and empirically evaluate their impact on text categorization. To this end, we experiment with five different word sub-spaces: title words, first sentence words, keyphrases, domain-specific words, and named entities. We also test TF·IDF-based unsupervised methods for extracting keyphrases and domain-specific words, and empirically verify their feasibility for text categorization. We demonstrate that using representative words outperforms a simple 1-gram model.
Source Title: ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium
URI: http://scholarbank.nus.edu.sg/handle/10635/39080
ISBN: 9781742101712
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

48
checked on Dec 15, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.