Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/16205
DC FieldValue
dc.titleA new term weighting method for text categorization
dc.contributor.authorLAN MAN
dc.date.accessioned2010-04-08T11:02:10Z
dc.date.available2010-04-08T11:02:10Z
dc.date.issued2007-06-06
dc.identifier.citationLAN MAN (2007-06-06). A new term weighting method for text categorization. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/16205
dc.description.abstractText representation is the task of transforming the content of a textual document into a compact representation of its content so that the document could be recognized and classified by a computer or a classifier. This thesis focuses on the development of an effective and efficient term weighting method for text categorization task. We selected the single token as the unit of feature because the previous researches showed that this simple type of features outperformed other complicated type of features. We have investigated several widely-used unsupervised and supervised term weighting methods on several popular data collections in combination with SVM and kNN algorithms. In consideration of the distribution of relevant documents in the collection and analysis of the term's discriminating power, we have proposed a new term weighting scheme, namely $tf.rf$. The controlled experimental results showed that the term weighting methods show mixed performance in terms of different category distribution data sets and different learning algorithms. Most of the supervised term weighting methods which are based on information theory have not shown satisfactory performance according to our experimental results. However, the newly proposed $tf.rf$ method shows a consistently better performance than other term weighting methods. On the other hand, the popularly used $tf.idf$ method has not shown a uniformly good performance with respect to different category distribution data sets.
dc.language.isoen
dc.subjectText Categorization, Term Weighting Method, Support Vector Machine, kNN
dc.typeThesis
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorTAN CHEW LIM
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LanMan.pdf1.92 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.