Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.eswa.2007.10.042
DC FieldValue
dc.titleImbalanced text classification: A term weighting approach
dc.contributor.authorLiu, Y.
dc.contributor.authorLoh, H.T.
dc.contributor.authorSun, A.
dc.date.accessioned2014-06-17T06:23:45Z
dc.date.available2014-06-17T06:23:45Z
dc.date.issued2009-01
dc.identifier.citationLiu, Y., Loh, H.T., Sun, A. (2009-01). Imbalanced text classification: A term weighting approach. Expert Systems with Applications 36 (1) : 690-701. ScholarBank@NUS Repository. https://doi.org/10.1016/j.eswa.2007.10.042
dc.identifier.issn09574174
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/60483
dc.description.abstractThe natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information ratios, i.e. relevance indicators. Such relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study using both Support Vector Machines and Naïve Bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while the performance for major categories are not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed data sets. © 2007 Elsevier Ltd. All rights reserved.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1016/j.eswa.2007.10.042
dc.sourceScopus
dc.subjectImbalanced data
dc.subjectTerm weighting scheme
dc.subjectText classification
dc.typeArticle
dc.contributor.departmentMECHANICAL ENGINEERING
dc.description.doi10.1016/j.eswa.2007.10.042
dc.description.sourcetitleExpert Systems with Applications
dc.description.volume36
dc.description.issue1
dc.description.page690-701
dc.description.codenESAPE
dc.identifier.isiut000264182800069
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.