Please use this identifier to cite or link to this item: https://doi.org/10.1186/s12859-022-04626-w
DC FieldValue
dc.titleEnsembleFam: towards more accurate protein family prediction in the twilight zone
dc.contributor.authorKabir, MN
dc.contributor.authorWong, L
dc.date.accessioned2023-06-12T04:39:44Z
dc.date.available2023-06-12T04:39:44Z
dc.date.issued2022-12-01
dc.identifier.citationKabir, MN, Wong, L (2022-12-01). EnsembleFam: towards more accurate protein family prediction in the twilight zone. BMC Bioinformatics 23 (1) : 90-. ScholarBank@NUS Repository. https://doi.org/10.1186/s12859-022-04626-w
dc.identifier.issn1471-2105
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/241853
dc.description.abstractBackground: Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. Results: We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. Conclusions: EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.
dc.publisherSpringer Science and Business Media LLC
dc.sourceElements
dc.subjectEnsemble classifier
dc.subjectProtein function prediction
dc.subjectSequence homology
dc.subjectSupport vector machine
dc.subjectTwilight zone sequence
dc.subjectHumans
dc.subjectProteins
dc.subjectSupport Vector Machine
dc.typeArticle
dc.date.updated2023-06-06T02:08:57Z
dc.contributor.departmentNUS GRADUATE SCHOOL
dc.description.doi10.1186/s12859-022-04626-w
dc.description.sourcetitleBMC Bioinformatics
dc.description.volume23
dc.description.issue1
dc.description.page90-
dc.published.statePublished
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
EnsembleFam towards more accurate protein family prediction in the twilight zone.pdf1.99 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.