A comparative study on term weighting schemes for text categorization
Lan, M. ; Sung, S.-Y. ; Low, H.-B. ; Tan, C.-L.
Low, H.-B.
Citations
Altmetric:
Alternative Title
Abstract
The term weighting scheme, which is used to convert documents into vectors in the term spaces, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance rather than the kernel functions of S Ms for the text categorization task. In this paper, we conducted experiments to compare various term weighting schemes with S M on two widely-used benchmark data sets. We also presented a new term weighting scheme t f . r f for text categorization. The cross-scheme comparison was performed by using McNcmar's Tests. The controlled experimental results showed that the newly proposed t f . r f scheme is significantly better than other term weighting schemes. Compared with schemes related with t f factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization. The binary and t f .chi representations significantly underperform the other term weighting schemes. © 2005 IEEE.
Keywords
Source Title
Proceedings of the International Joint Conference on Neural Networks
Publisher
Series/Report No.
Collections
Rights
Date
2005
DOI
10.1109/IJCNN.2005.1555890
Type
Conference Paper