Please use this identifier to cite or link to this item:
Title: A new term weighting method for text categorization
Authors: LAN MAN
Keywords: Text Categorization, Term Weighting Method, Support Vector Machine, kNN
Issue Date: 6-Jun-2007
Citation: LAN MAN (2007-06-06). A new term weighting method for text categorization. ScholarBank@NUS Repository.
Abstract: Text representation is the task of transforming the content of a textual document into a compact representation of its content so that the document could be recognized and classified by a computer or a classifier. This thesis focuses on the development of an effective and efficient term weighting method for text categorization task. We selected the single token as the unit of feature because the previous researches showed that this simple type of features outperformed other complicated type of features. We have investigated several widely-used unsupervised and supervised term weighting methods on several popular data collections in combination with SVM and kNN algorithms. In consideration of the distribution of relevant documents in the collection and analysis of the term's discriminating power, we have proposed a new term weighting scheme, namely $tf.rf$. The controlled experimental results showed that the term weighting methods show mixed performance in terms of different category distribution data sets and different learning algorithms. Most of the supervised term weighting methods which are based on information theory have not shown satisfactory performance according to our experimental results. However, the newly proposed $tf.rf$ method shows a consistently better performance than other term weighting methods. On the other hand, the popularly used $tf.idf$ method has not shown a uniformly good performance with respect to different category distribution data sets.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LanMan.pdf1.92 MBAdobe PDF



Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.