Statistical Significance Assessment in Computational Systems Biology | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/33340

Title:	Statistical Significance Assessment in Computational Systems Biology
Authors:	LI JUNTAO
Keywords:	Systems Biology, Microarray, p-value, False discovery rate, Multiple testing, Empirical distribution
Issue Date:	11-Jan-2012
Citation:	LI JUNTAO (2012-01-11). Statistical Significance Assessment in Computational Systems Biology. ScholarBank@NUS Repository.
Abstract:	In systems biology, high-throughput omics data, such as microarray and sequencing data, are generated to be analyzed. Multiple testing methods always are employed to interpret the omics data. In multiple testing problems, false discovery rates (FDR) are commonly used to assess statistical significance. Appropriate tests are usually chosen for the underlying data sets. However the statistical significance (p-values and error rates) may not be appropriately estimated due to the complex data structure of the microarray. In this thesis, we proposed two methods to improve the false discovery rate estimation in computational systems biology. The first method, called constrained regression recalibration (ConReg-R), recalibrates the empirical p-values by modeling their distribution in order to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses and FDR are estimated after the recalibration. ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments. The second method, called iterative piecewise linear regression (iPLR), in the context of SAM to re-estimate the expected statistics and FDR for both one-sided as well as two-sided statistics based tests. We demonstrate that iPLR can accurately assess the statistical significance in batch confounded microarray analysis. It can successfully reduce the effects of batch confounding in the FDR estimation and elicit the true significance of differential expression. We demonstrate the efficacy of iPLR on both simulated as well as several real microarray datasets. Moreover, iPLR provides a better interpretation of the linear model parameters.
URI:	http://scholarbank.nus.edu.sg/handle/10635/33340
Appears in Collections:	Ph.D Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
LiJuntao.pdf		3.15 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.