Text classification by labeling words

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/43330

DC Field	Value
dc.title	Text classification by labeling words
dc.contributor.author	Liu, B.
dc.contributor.author	Li, X.
dc.contributor.author	Lee, W.S.
dc.contributor.author	Yu, P.S.
dc.date.accessioned	2013-07-23T09:31:09Z
dc.date.available	2013-07-23T09:31:09Z
dc.date.issued	2004
dc.identifier.citation	Liu, B., Li, X., Lee, W.S., Yu, P.S. (2004). Text classification by labeling words. Proceedings of the National Conference on Artificial Intelligence : 425-430. ScholarBank@NUS Repository.
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/43330
dc.description.abstract	Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.department	SINGAPORE-MIT ALLIANCE
dc.description.sourcetitle	Proceedings of the National Conference on Artificial Intelligence
dc.description.page	425-430
dc.description.coden	PNAIE
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM