EMBRACING NOISE IN BIOINFORMATICS | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/121045

Title:	EMBRACING NOISE IN BIOINFORMATICS
Authors:	KOH CHUAN HOCK
Keywords:	Bioinformatics,System Biology,Computational Biology,Model Checking,Parameter Estimation,Microarray cross-batch prediction
Issue Date:	21-Sep-2012
Citation:	KOH CHUAN HOCK (2012-09-21). EMBRACING NOISE IN BIOINFORMATICS. ScholarBank@NUS Repository.
Abstract:	In 1953, James Watson and Francis Crick discovered the structure of DNA. This eventually led to the Human Genome Project, which was completed in 2003. The post- genomic era opens up exciting possibilities, along with grand challenges to overcome. One of which is to build a mathematical model of the whole cell. The first part of this thesis focuses on building efficient and practical tools for model calibration and validation that are scalable to handle models of massive sizes. We built two powerful and easy-to-use software (DA and MIRACH) for estimating parameters¿ distribution of a given biological system and testing whether certain given properties are satisfied by a given biological system. We then combined the technology of these two software to design a framework that allows us to perform parameter estimation, even when time series data are not available, by using known biological properties and model checking. In building these tools, we utilized state-of-the-art hypothesis testing algorithms, which are necessary for interpreting the stochastic output of biological systems, and discovered that they came with practical limitations. This leads us to the second part of the thesis, where we developed algorithms to overcome these limitations. Specifically, we developed two novel algorithms for sequential hypothesis testing that are compu- tationally faster and more memory efficient. In addition, by integrating sequential hypothesis testing algorithms with bagging, we developed a new powerful algorithm which we named dynamic bagging. This algorithm supersedes standard bagging by having all the benefits of standard bagging but is more efficient and removes the need to arbitrarily fix a priori the number of bootstrap replicates. We first used dynamic bagging in gene expression profile analysis to overcome batch effects that have plagued many gene expression analysis projects. We then went on to show that its usefulness is not limited to any problem domain. We also show that predictions from dynamic bag- ging is consistent to standard bagging with much larger number of bootstrap replicates. Finally, we offered an alternative and more direct explanation of bagging¿s effectiveness than the classical explanation based on bias-variance decomposition.
URI:	http://scholarbank.nus.edu.sg/handle/10635/121045
Appears in Collections:	Ph.D Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
KohCH.pdf		5.37 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.