Please use this identifier to cite or link to this item: https://doi.org/10.1021/ci800022e
DC FieldValue
dc.titleEvaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds
dc.contributor.authorMa, X.H.
dc.contributor.authorWang, R.
dc.contributor.authorYang, S.Y.
dc.contributor.authorLi, Z.R.
dc.contributor.authorXue, Y.
dc.contributor.authorWei, Y.C.
dc.contributor.authorLow, B.C.
dc.contributor.authorChen, Y.Z.
dc.date.accessioned2014-10-27T08:27:49Z
dc.date.available2014-10-27T08:27:49Z
dc.date.issued2008-06
dc.identifier.citationMa, X.H., Wang, R., Yang, S.Y., Li, Z.R., Xue, Y., Wei, Y.C., Low, B.C., Chen, Y.Z. (2008-06). Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. Journal of Chemical Information and Modeling 48 (6) : 1227-1237. ScholarBank@NUS Repository. https://doi.org/10.1021/ci800022e
dc.identifier.issn15499596
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/100617
dc.description.abstractVirtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching mediod based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in die high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside tie training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates. © 2008 American Chemical Society.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1021/ci800022e
dc.sourceScopus
dc.typeArticle
dc.contributor.departmentPHARMACY
dc.contributor.departmentBIOLOGICAL SCIENCES
dc.description.doi10.1021/ci800022e
dc.description.sourcetitleJournal of Chemical Information and Modeling
dc.description.volume48
dc.description.issue6
dc.description.page1227-1237
dc.identifier.isiut000257026800011
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.