Please use this identifier to cite or link to this item:
|Title:||Text classification by labeling words||Authors:||Liu, B.
|Issue Date:||2004||Citation:||Liu, B.,Li, X.,Lee, W.S.,Yu, P.S. (2004). Text classification by labeling words. Proceedings of the National Conference on Artificial Intelligence : 425-430. ScholarBank@NUS Repository.||Abstract:||Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.||Source Title:||Proceedings of the National Conference on Artificial Intelligence||URI:||http://scholarbank.nus.edu.sg/handle/10635/43330|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.