Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/17994
Title: Database development and machine learning classification of medicinal chemicals and biomolecules
Authors: PANKAJ KUMAR
Keywords: database, machine learning, kinetic, p38, herbal, genotoxicity
Issue Date: 11-Aug-2009
Citation: PANKAJ KUMAR (2009-08-11). Database development and machine learning classification of medicinal chemicals and biomolecules. ScholarBank@NUS Repository.
Abstract: The drug discovery is a long and time-consuming process that also requires huge sums of financial investment. Advances in bioinformatics areas such as database development and machine learning methods have played a great role in reducing the time and money invested, rationalizing the entire approach, and increasing efficiency for drug discovery processes. Focus of my work has been to aid the drug discovery processes applying various computational methods. A particular focus has been given to improvise the storing, managing and providing the customized data by developing web accessible databases of medicinal chemicals and biomolecules; i.e. (i) Updating of Kinetic Database of Biomolecular Interactions(KDBI), and (ii) Indian Herbs and their Chemical Database(IHCD) . Also, focus has been given on the use of machine learning classification by predicting the medicinal chemicals for (i) genotoxicity, and (ii) p38 inhibitors. Database development for biological and chemical data is explored from the beginning of data collection to deploying of web application. Biological and chemical data which can be helpful in drug discovery process are used for this purpose. The complexities involved such as biological data collection, filtering, cross-linking to other database, providing web accessibility, facilitating data download, and modeling of databases are explained in detail. The two databases, IHCD and KDBI, developed have different kind of data content and cover a broad area of biological and chemical databases space. IHCD contain information on a total of 2326 herbs from 430 therapeutic classes and 3978 chemical ingredients. IHCD also contain information about chemical ingredient through cross-linking to chemical, pathway, and molecular binding databases PUBCHEM, NCBI bioassay, KEGG pathways, BIND, and bindingDB databases respectively. IHCD also provides 3D structure, computed molecular descriptors for all ingredients, and computer predicted potential protein targets and binding structures for select ingredients. The other database, KDBI, contain information on 19263 experimental kinetic data, which include 2635 protein-protein, 1711 protein-nucleic acid, 11873 protein-small molecule, and 1995 nucleic acid-small molecule interactions. KDBI also has 63 literature reported pathway simulation model kinetic parameter data set and provides facility to download each pathway kinetic dataset in SBML file format. Machine Learning Classification methods are employed in areas that are directly linked to early stage of drug discovery such as predicting genotoxic compounds and p38 MAPK inhibitor by collecting more than 4000 genotoxic compounds and about 1100 p38 MAPK inhibitors. Different types of machine learning methods such as SVM, kNN, PNN and decision trees are applied for these studies, although the special focus is on SVM. Also, machine learning based virtual screening is done on PUBCHEM and MDDR database. A total of 522 molecular descriptors were calculated for each compound to represent compounds and either entire 522 or selected 100 descriptors were used for machine learning classification.
URI: http://scholarbank.nus.edu.sg/handle/10635/17994
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
KumarPankaj.pdf3.7 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.