Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/33340
DC FieldValue
dc.titleStatistical Significance Assessment in Computational Systems Biology
dc.contributor.authorLI JUNTAO
dc.date.accessioned2012-05-31T18:02:04Z
dc.date.available2012-05-31T18:02:04Z
dc.date.issued2012-01-11
dc.identifier.citationLI JUNTAO (2012-01-11). Statistical Significance Assessment in Computational Systems Biology. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/33340
dc.description.abstractIn systems biology, high-throughput omics data, such as microarray and sequencing data, are generated to be analyzed. Multiple testing methods always are employed to interpret the omics data. In multiple testing problems, false discovery rates (FDR) are commonly used to assess statistical significance. Appropriate tests are usually chosen for the underlying data sets. However the statistical significance (p-values and error rates) may not be appropriately estimated due to the complex data structure of the microarray. In this thesis, we proposed two methods to improve the false discovery rate estimation in computational systems biology. The first method, called constrained regression recalibration (ConReg-R), recalibrates the empirical p-values by modeling their distribution in order to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses and FDR are estimated after the recalibration. ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments. The second method, called iterative piecewise linear regression (iPLR), in the context of SAM to re-estimate the expected statistics and FDR for both one-sided as well as two-sided statistics based tests. We demonstrate that iPLR can accurately assess the statistical significance in batch confounded microarray analysis. It can successfully reduce the effects of batch confounding in the FDR estimation and elicit the true significance of differential expression. We demonstrate the efficacy of iPLR on both simulated as well as several real microarray datasets. Moreover, iPLR provides a better interpretation of the linear model parameters.
dc.language.isoen
dc.subjectSystems Biology, Microarray, p-value, False discovery rate, Multiple testing, Empirical distribution
dc.typeThesis
dc.contributor.departmentSTATISTICS & APPLIED PROBABILITY
dc.contributor.supervisorCHOI KWOK PUI
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LiJuntao.pdf3.15 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.