Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1021/ci800022e

Title:	Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds
Authors:	Ma, X.H. Wang, R. Yang, S.Y. Li, Z.R. Xue, Y. Wei, Y.C. Low, B.C. Chen, Y.Z.
Issue Date:	Jun-2008
Citation:	Ma, X.H., Wang, R., Yang, S.Y., Li, Z.R., Xue, Y., Wei, Y.C., Low, B.C., Chen, Y.Z. (2008-06). Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. Journal of Chemical Information and Modeling 48 (6) : 1227-1237. ScholarBank@NUS Repository. https://doi.org/10.1021/ci800022e
Abstract:	Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching mediod based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in die high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside tie training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates. © 2008 American Chemical Society.
Source Title:	Journal of Chemical Information and Modeling
URI:	http://scholarbank.nus.edu.sg/handle/10635/100617
ISSN:	15499596
DOI:	10.1021/ci800022e
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.