Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.3390/ijms19010183

Title:	Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
Authors:	Yu, C.Y Li, X.X Yang, H Li, Y.H Xue, W.W Chen, Y.Z Tao, L Zhu, F
Keywords:	evaluation study machine learning procedures proteomics reproducibility sequence analysis software standards Machine Learning Proteomics Reproducibility of Results Sequence Analysis, Protein Software
Issue Date:	2018
Citation:	Yu, C.Y, Li, X.X, Yang, H, Li, Y.H, Xue, W.W, Chen, Y.Z, Tao, L, Zhu, F (2018). Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. International journal of molecular sciences 19 (1). ScholarBank@NUS Repository. https://doi.org/10.3390/ijms19010183
Rights:	Attribution 4.0 International
Abstract:	The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
Source Title:	International journal of molecular sciences
URI:	https://scholarbank.nus.edu.sg/handle/10635/182103
ISSN:	14220067
DOI:	10.3390/ijms19010183
Rights:	Attribution 4.0 International
Appears in Collections:	Elements Staff Publications

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
10_3390_ijms19010183.pdf		1.49 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Altmetric

This item is licensed under a Creative Commons License