Domain concept handling in automated text categorization | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICIEA.2010.5514692

Title:	Domain concept handling in automated text categorization
Authors:	Liu, Y. Loh, H.T.
Keywords:	Domain Concept Representation Information Management Text Categorization Text Mining
Issue Date:	2010
Citation:	Liu, Y.,Loh, H.T. (2010). Domain concept handling in automated text categorization. Proceedings of the 2010 5th IEEE Conference on Industrial Electronics and Applications, ICIEA 2010 : 1543-1549. ScholarBank@NUS Repository. https://doi.org/10.1109/ICIEA.2010.5514692
Abstract:	Single term based document representations, e.g. bag- of-words, have been widely accepted in the machine learning, information retrieval and text mining community. One notable limitation of such methods is that they do not consider the rich information resident in the semantic relations among terms. This paper reports our approach of concepts handling in document representation and its effect on the performance of text categorization. We introduce a Frequent word Sequence algorithm that generates concept-centered phrases to render domain knowledge concepts. Our experimental study based on a domain centered corpus shows that a consistent performance improvement can be achieved when concept-centered phrases are included in addition to the single term based features in document representations. We also observed that a universally suitable support threshold does not exist and the removal of concept irrelevant sequences can possibly further improve the performance at a lower support level. © 2010 IEEE.
Source Title:	Proceedings of the 2010 5th IEEE Conference on Industrial Electronics and Applications, ICIEA 2010
URI:	http://scholarbank.nus.edu.sg/handle/10635/73370
ISBN:	9781424450466
DOI:	10.1109/ICIEA.2010.5514692
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.