Please use this identifier to cite or link to this item: https://doi.org/10.1115/DETC2011-47313
Title: P-SMOTE: One oversampling technique for class imbalanced text classification
Authors: Wang, J.
Lu, W.F. 
Loh, H.T. 
Issue Date: 2011
Citation: Wang, J.,Lu, W.F.,Loh, H.T. (2011). P-SMOTE: One oversampling technique for class imbalanced text classification. Proceedings of the ASME Design Engineering Technical Conference 2 (PARTS A AND B) : 1089-1098. ScholarBank@NUS Repository. https://doi.org/10.1115/DETC2011-47313
Abstract: The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results. © 2011 by ASME.
Source Title: Proceedings of the ASME Design Engineering Technical Conference
URI: http://scholarbank.nus.edu.sg/handle/10635/51644
ISBN: 9780791854792
DOI: 10.1115/DETC2011-47313
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.