Please use this identifier to cite or link to this item: https://doi.org/10.1109/IJCNN.2007.4371361
Title: Text representations for text categorization: A case study in biomedical domain
Authors: Lan, M. 
Tan, C.L. 
Su, J.
Low, H.B.
Issue Date: 2007
Citation: Lan, M.,Tan, C.L.,Su, J.,Low, H.B. (2007). Text representations for text categorization: A case study in biomedical domain. IEEE International Conference on Neural Networks - Conference Proceedings : 2557-2562. ScholarBank@NUS Repository. https://doi.org/10.1109/IJCNN.2007.4371361
Abstract: In vector space model (VSM), textual documents are represented as vectors in the term space. Therefore, there are two issues in this representation, i.e. (1) what should a term be and (2) how to weight a term. This paper examined ways to represent text from the above two aspects to improve the performance of text categorization. Different representations have been evaluated using SVM on three biomedical corpora. The controlled experiments showed that the straightforward usage of named entities as terms in VSM does not show performance improvements over the bag-of-words representation. On the other hand, the term weighting method slightly improved the performance. However, to further improve the performance of text categorization, more advanced techniques and more effective usages of natural language processing for text representations appear needed. ©2007 IEEE.
Source Title: IEEE International Conference on Neural Networks - Conference Proceedings
URI: http://scholarbank.nus.edu.sg/handle/10635/40984
ISBN: 142441380X
ISSN: 10987576
DOI: 10.1109/IJCNN.2007.4371361
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.