Please use this identifier to cite or link to this item:
Title: Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds
Authors: Ma, X.H. 
Wang, R.
Yang, S.Y.
Li, Z.R.
Xue, Y.
Wei, Y.C.
Low, B.C. 
Chen, Y.Z. 
Issue Date: Jun-2008
Citation: Ma, X.H., Wang, R., Yang, S.Y., Li, Z.R., Xue, Y., Wei, Y.C., Low, B.C., Chen, Y.Z. (2008-06). Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. Journal of Chemical Information and Modeling 48 (6) : 1227-1237. ScholarBank@NUS Repository.
Abstract: Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching mediod based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in die high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside tie training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates. © 2008 American Chemical Society.
Source Title: Journal of Chemical Information and Modeling
ISSN: 15499596
DOI: 10.1021/ci800022e
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Jan 30, 2023


checked on Jan 30, 2023

Page view(s)

checked on Feb 2, 2023

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.