Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/78386
Title: The integration of multiple feature representations for protein protein interaction classification task
Authors: Lan, M. 
Tan, C.L. 
Issue Date: 2007
Citation: Lan, M.,Tan, C.L. (2007). The integration of multiple feature representations for protein protein interaction classification task. CEUR Workshop Proceedings 319 : 3.1-3.17. ScholarBank@NUS Repository.
Abstract: Background: In order to extract and retrieve protein protein interaction (PPI) information from text, automatic detecting protein interaction relevant articles for database curation is a crucial step. The vast majority of this research used the "bag-of-words" representation, where each feature corresponds to a single word. For the sake of capturing more information left out from this simple bag-of-word representation, we examined alternative ways to represent text based on advanced natural language techniques, i.e. protein named entities, and biological domain knowledge, i.e. trigger keywords. Results: These feature representations are evaluated using SVM classifier on the BioCreAtIvE II benchmark corpus. On their own the new representations are not found to produce a significant performance improvement based on the statistical significance tests. On the other hand, the performance achieved by the integration of 70 trigger keywords and 4 protein named entities features is comparable with that achieved by using bag-of-words alone. In addition, the only 4 protein named entities features (4PNE) obtained the best recall performance (98.13%). Conclusions: In general, our work supports that more sophisticated natural language processing (NLP) techniques and more advanced usage of these techniques need to be developed before better text representations can be produced. The feature representations with simple NLP techniques would benefit the real-life detecting system implemented with great efficiency and speed without losing the classification performance and exhaustive curation system.
Source Title: CEUR Workshop Proceedings
URI: http://scholarbank.nus.edu.sg/handle/10635/78386
ISSN: 16130073
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.