Please use this identifier to cite or link to this item: https://doi.org/10.1002/pmic.200401118
Title: Effect of training datasets on support vector machine prediction of protein-protein interactions
Authors: Lo, S.L. 
Cai, C.Z.
Chen, Y.Z. 
Chung, M.C.M.
Keywords: Database of interacting proteins
Protein function prediction
Protein-protein interaction prediction
Shuffled sequence
Support vector machine
SVMlight
Issue Date: Mar-2005
Citation: Lo, S.L., Cai, C.Z., Chen, Y.Z., Chung, M.C.M. (2005-03). Effect of training datasets on support vector machine prediction of protein-protein interactions. Proteomics 5 (4) : 876-884. ScholarBank@NUS Repository. https://doi.org/10.1002/pmic.200401118
Abstract: Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction. © 2005 WILEY-VCH Verlag GmbH & Co. KGaA.
Source Title: Proteomics
URI: http://scholarbank.nus.edu.sg/handle/10635/102208
ISSN: 16159853
DOI: 10.1002/pmic.200401118
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

59
checked on Dec 7, 2019

WEB OF SCIENCETM
Citations

55
checked on Nov 22, 2019

Page view(s)

70
checked on Nov 30, 2019

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.