Extracting key-substring-group features for text classification

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/40715

DC Field	Value
dc.title	Extracting key-substring-group features for text classification
dc.contributor.author	Zhang, D.
dc.contributor.author	Lee, W.S.
dc.date.accessioned	2013-07-04T08:10:41Z
dc.date.available	2013-07-04T08:10:41Z
dc.date.issued	2006
dc.identifier.citation	Zhang, D.,Lee, W.S. (2006). Extracting key-substring-group features for text classification. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2006 : 474-483. ScholarBank@NUS Repository.
dc.identifier.isbn	1595933395
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40715
dc.description.abstract	In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostly focused on different variants of generative Markov chain models. Although discriminative machine learning methods like Support Vector Machine (SVM) have been quite successful in text classification with word features, it is neither effective nor efficient to apply them straightforwardly taking all substrings in the corpus as features. In this paper, we propose to partition all substrings into statistical equivalence groups, and then pick those groups which are important (in the statistical sense) as features (named key-substring-group features) for text classification. In particular, we propose a suffix tree based algorithm that can extract such features in linear time (with respect to the total number of characters in the corpus). Our experiments on English, Chinese and Greek datasets show that SVM with key-substring-group features can achieve outstanding performance for various text classification tasks. Copyright 2006 ACM.
dc.source	Scopus
dc.subject	Feature Extraction
dc.subject	Machine Learning
dc.subject	Suffix Tree
dc.subject	Text Classification
dc.subject	Text Mining
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.sourcetitle	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
dc.description.volume	2006
dc.description.page	474-483
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM