Please use this identifier to cite or link to this item:
https://doi.org/10.1115/DETC2011-47313
Title: | P-SMOTE: One oversampling technique for class imbalanced text classification | Authors: | Wang, J. Lu, W.F. Loh, H.T. |
Issue Date: | 2011 | Citation: | Wang, J.,Lu, W.F.,Loh, H.T. (2011). P-SMOTE: One oversampling technique for class imbalanced text classification. Proceedings of the ASME Design Engineering Technical Conference 2 (PARTS A AND B) : 1089-1098. ScholarBank@NUS Repository. https://doi.org/10.1115/DETC2011-47313 | Abstract: | The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results. © 2011 by ASME. | Source Title: | Proceedings of the ASME Design Engineering Technical Conference | URI: | http://scholarbank.nus.edu.sg/handle/10635/51644 | ISBN: | 9780791854792 | DOI: | 10.1115/DETC2011-47313 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.