ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Fri, 10 Apr 2020 19:40:59 GMT2020-04-10T19:40:59Z50361- A post-processing method for optimizing synthesis strategy for oligonucleotide microarrayshttps://scholarbank.nus.edu.sg/handle/10635/43120Title: A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays
Authors: Ning, K.; Choi, K.P.; Leong, H.W.; Zhang, L.
Abstract: The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos. Our experimental results on both simulated and 52 real datasets show that no single method consistently gives the best synthesis strategy, and post-processing an existing strategy is necessary as it often reduces the number of synthesis cycles further. © The Author 2005. Published by Oxford University Press. All rights reserved.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431202005-01-01T00:00:00Z
- Spectrum-based de novo repeat detection in genomic sequenceshttps://scholarbank.nus.edu.sg/handle/10635/43115Title: Spectrum-based de novo repeat detection in genomic sequences
Authors: Do, H.H.; Choi, K.P.; Preparata, F.P.; Sung, W.K.; Zhang, L.
Abstract: A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg. © Mary Ann Liebert, Inc. 2008.
Tue, 01 Jan 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431152008-01-01T00:00:00Z
- Maximum likelihood inference of the evolutionary history of a PPI network from the duplication history of its proteinshttps://scholarbank.nus.edu.sg/handle/10635/103535Title: Maximum likelihood inference of the evolutionary history of a PPI network from the duplication history of its proteins
Authors: Li, S.; Choi, K.P.; Wu, T.; Zhang, L.
Abstract: Evolutionary history of protein-protein interaction (PPI) networks provides valuable insight into molecular mechanisms of network growth. In this paper, we study how to infer the evolutionary history of a PPI network from its protein duplication relationship. We show that for a plausible evolutionary history of a PPI network, its relative quality, measured by the so-called loss number, is independent of the growth parameters of the network and can be computed efficiently. This finding leads us to propose two fast maximum likelihood algorithms to infer the evolutionary history of a PPI network given the duplication history of its proteins. Simulation studies demonstrated that our approach, which takes advantage of protein duplication information, outperforms NetArch, the first maximum likelihood algorithm for PPI network history reconstruction. Using the proposed method, we studied the topological change of the PPI networks of the yeast, fruitfly, and worm. © 2013 IEEE.
Fri, 01 Nov 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1035352013-11-01T00:00:00Z
- Quick, practical selection of effective seeds for homology searchhttps://scholarbank.nus.edu.sg/handle/10635/104699Title: Quick, practical selection of effective seeds for homology search
Authors: Preparata, F.P.; Zhang, L.; Choi, K.P.
Abstract: It has been observed that in homology search gapped seeds have better sensitivity than ungapped ones for the same cost (weight). In this paper, we propose a probability leakage model (a dissipative Markov system) to elucidate the mechanism that confers power to spaced seeds. Based on this model, we identify desirable features of gapped search seeds and formulate an extremely efficient procedure for seed design: it samples from the set of spaced seed exhibiting those features, evaluates their sensitivity, and then selects the best. The sensitivity of the constructed seeds is negligibly less than that of the corresponding known optimal seeds. While the challenging mathematical question of characterizing optimal search seeds remains open, we believe that our eminently efficient and effective approach represents a satisfactory solution from a practitioner's viewpoint. © Mary Ann Liebert, Inc.
Tue, 01 Nov 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1046992005-11-01T00:00:00Z
- Degree distribution of large networks generated by the partial duplication modelhttps://scholarbank.nus.edu.sg/handle/10635/52862Title: Degree distribution of large networks generated by the partial duplication model
Authors: Li, S.; Choi, K.P.; Wu, T.
Abstract: In this paper, we present a rigorous analysis on the limiting behavior of the degree distribution of the partial duplication model, a random network growth model in the duplication and divergence family that is popular in the study of biological networks. We show that for each non-negative integer k, the expected proportion of nodes of degree k approaches a limit as the network becomes large. This fills in a gap in previous studies. In addition, we prove that p=1/2, where p is the selection probability of the model, is the phase transition for the expected proportion of isolated nodes converging to 1, and hence answer a question raised in Bebek et al. [G. Bebek, P. Berenbrink, C. Cooper, T. Friedetzky, J. Nadeau, S.C. Sahinalp, The degree distribution of the generalized duplication model, Theoret. Comput. Sci. 369 (2006) 239-249]. We also obtain asymptotic bounds on the convergence rates of degree distribution. Since the observed networks typically do not contain isolated nodes, we study the subgraph consisting of all non-isolated nodes contained in the networks generated by the partial duplication model, and show that p=1/2 is again a phase transition for the limiting behavior of its degree distribution.
Mon, 11 Mar 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/528622013-03-11T00:00:00Z
- Good spaced seeds for homology searchhttps://scholarbank.nus.edu.sg/handle/10635/104495Title: Good spaced seeds for homology search
Authors: Choi, K.P.; Zeng, F.; Zhang, L.
Abstract: Motivation: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. Results: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences. © Oxford University Press 2004; all rights reserved.
Sat, 01 May 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1044952004-05-01T00:00:00Z
- Sharp bounds and normalization of Wiener-type indiceshttps://scholarbank.nus.edu.sg/handle/10635/105360Title: Sharp bounds and normalization of Wiener-type indices
Authors: Tian, D.; Choi, K.P.
Abstract: Complex networks abound in physical, biological and social sciences. Quantifying a network's topological structure facilitates network exploration and analysis, and network comparison, clustering and classification. A number of Wiener type indices have recently been incorporated as distance-based descriptors of complex networks, such as the R package QuACN. Wiener type indices are known to depend both on the network's number of nodes and topology. To apply these indices to measure similarity of networks of different numbers of nodes, normalization of these indices is needed to correct the effect of the number of nodes in a network. This paper aims to fill this gap. Moreover, we introduce an f -Wiener index of network G, denoted by Wf (G). This notion generalizes the Wiener index to a very wide class of Wiener type indices including all known Wiener type indices. We identify the maximum and minimum of Wf (G) over a set of networks with n nodes. We then introduce our normalized-version of f -Wiener index. The normalized f -Wiener indices were demonstrated, in a number of experiments, to improve significantly the hierarchical clustering over the non-normalized counterparts. © 2013 Tian, Choi.
Fri, 08 Nov 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1053602013-11-08T00:00:00Z
- Sensitivity analysis and efficient method for identifying optimal spaced seedshttps://scholarbank.nus.edu.sg/handle/10635/104096Title: Sensitivity analysis and efficient method for identifying optimal spaced seeds
Authors: Choi, K.P.; Zhang, L.
Abstract: The novel introduction of spaced seed idea in the filtration stage of sequence comparison by Ma et al. (Bioinformatics 18 (2002) 440) has greatly increased the sensitivity of homology search without compromising the speed of search. Finding the optimal spaced seeds is of great importance both theoretically and in designing better search tool for sequence comparison. In this paper, we study the computational aspects of calculating the hitting probability of spaced seeds; and based on these results, we propose an efficient algorithm for identifying optimal spaced seeds.
Sun, 01 Feb 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1040962004-02-01T00:00:00Z
- Palindromes in SARS and other coronaviruseshttps://scholarbank.nus.edu.sg/handle/10635/103913Title: Palindromes in SARS and other coronaviruses
Authors: Chew, D.S.H.; Choi, K.P.; Heidner, H.; Leung, M.-Y.
Abstract: With the identification of a novel coronavirus associated with the severe acute respiratory syndrome (SARS), computational analysis of its RNA genome sequence is expected to give useful clues to help elucidate the origin, evolution, and pathogenicity of the virus. In this paper, we study the collective counts of palindromes in the SARS genome along with all the completely sequenced coronaviruses. Based on a Markov-chain model for the genome sequence, the mean and standard deviation for the number of palindromes at or above a given length are derived. These theoretical results are complemented by extensive simulations to provide empirical estimates. Using a z score obtained from these mathematical and empirical means and standard deviations, we have observed that palindromes of length four are significantly underrepresented in all the coronaviruses in our data set. In contrast, length-six palindromes are significantly underrepresented only in the SARS coronavirus. Two other features are unique to the SARS sequence. First, there is a length-22 palindrome TCTTTAACAAGCTTGTTAAAGA spanning positions 25962-25983. Second, there are two repeating length-12 palindromes TTATAATTATAA spanning positions 22712-22723 and 22796-22807. Some further investigations into possible biological implications of these palindrome features are proposed.
Wed, 01 Sep 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1039132004-09-01T00:00:00Z
- Least-squares support vector machine approach to viral replication origin predictionhttps://scholarbank.nus.edu.sg/handle/10635/105193Title: Least-squares support vector machine approach to viral replication origin prediction
Authors: Cruz-Cano, R.; Chew, D.S.H.; Choi, K.-P.; Leung, M.-Y.
Abstract: Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications. © 2010 INFORMS.
Tue, 01 Jun 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1051932010-06-01T00:00:00Z
- ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimateshttps://scholarbank.nus.edu.sg/handle/10635/105069Title: ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates
Authors: Li, J.; Paramita, P.; Choi, K.P.; Karuturi, R.K.M.
Abstract: Background: False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity.Results: We propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration.Conclusions: ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.Reviewers: The manuscript was reviewed by Prof. Vladimir Kuznetsov, Prof. Philippe Broet, and Prof. Hongfang Liu (nominated by Prof. Yuriy Gusev). © 2011 Li et al; licensee BioMed Central Ltd.
Fri, 20 May 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1050692011-05-20T00:00:00Z
- Trade-offs between effectiveness and efficiency in stroke rehabilitationhttps://scholarbank.nus.edu.sg/handle/10635/108820Title: Trade-offs between effectiveness and efficiency in stroke rehabilitation
Authors: Koh, G.C.-H.; Chen, C.; Cheong, A.; Choo, T.B.; Pui, C.K.; Phoon, F.N.; Ming, C.K.; Yeow, T.B.; Petrella, R.; Thind, A.; Koh, D.; Seng, C.K.
Abstract: Background Most stroke research has studied rehabilitation effectiveness and rehabilitation efficiency separately and not investigated the potential trade-offs between these two indices of rehabilitation. Aims To determine whether there is a trade-off between independent factors of rehabilitation effectiveness and rehabilitation efficiency. Methods Using a retrospective cohort study design, we studied all stroke patients (n=2810) from two sub-acute rehabilitation hospitals from 1996 to 2005, representing 87·5% of national bed-years during the same period. Results Independent predictors of poorer rehabilitation effectiveness and log rehabilitation efficiency were • older age • race-ethnicity • caregiver availability • ischemic stroke • longer time to admission • dementia • admission Barthel Index score, and • length of stay. Rehabilitation effectiveness was lower in females, and the gender differences were significantly lower in those aged ≤70 years (β -4·7 (95% confidence interval -7·4 to -2·0)). There were trade-offs between effectiveness and efficiency with respect to admission Barthel Index score and length of stay. An increase of 10 in admission Barthel Index score predicted an increase of 3·6% (95% confidence interval 3·2-4·0) in effectiveness but a decrease of 0·04 (95% confidence interval -0·05 to -0·02) in log efficiency (a reduction of efficiency by 1·0 per 30 days). An increase in log length of stay by 1 (length of stay of 2·7 days) predicted an increase of 8·0% (95% confidence interval 5·7-10·3) in effectiveness but a decrease of 0·82 (95% confidence interval -0·90 to -0·74) in log efficiency (equivalent to a reduction in efficiency by 2·3 per 30 days). For optimal rehabilitation effectiveness and rehabilitation efficiency, the admission Barthel Index score was 30-62 and length of stay was 37-41 days. Conclusions: There are trade-offs between effectiveness and efficiency during inpatient sub-acute stroke rehabilitation with respect to admission functional status and length of stay. © 2011 The Authors. International Journal of Stroke © 2011 World Stroke Organization.
Sat, 01 Dec 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1088202012-12-01T00:00:00Z
- AT excursion: A new approach to predict replication origins in viral genomes by locating AT-rich regionshttps://scholarbank.nus.edu.sg/handle/10635/102904Title: AT excursion: A new approach to predict replication origins in viral genomes by locating AT-rich regions
Authors: Chew, D.S.H.; Leung, M.-Y.; Choi, K.P.
Abstract: Background: Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another. In this paper, we propose the AT excursion method, which is a score-based approach, to quantify local AT abundance in genomic sequences and use the identified high scoring segments for predicting replication origins. This method has the advantages of requiring no preset window size and having rigorous criteria to evaluate statistical significance of high scoring segments. Results: We have evaluated the AT excursion method by checking its predictions against known replication origins in herpesviruses and comparing its performance with an existing base weighted score method (BWS1). Out of 43 known origins, 39 are predicted by either one or the other method and 26 origins are predicted by both. The excursion method identifies six origins not predicted by BWS1, showing that the AT excursion method is a valuable complement to BWS1. We have also applied the AT excursion method to two other families of double stranded DNA viruses, the poxviruses and iridoviruses, of which very few replication origins are documented in the public domain. The prediction results are made available as supplementary materials at 1. Preliminary investigation shows that the proposed method works well on some larger genomes too. Conclusion: The AT excursion method will be a useful computational tool for identifying replication origins in a variety of genomic sequences. © 2007 Chew et al; licensee BioMed Central Ltd.
Mon, 21 May 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1029042007-05-21T00:00:00Z
- Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruseshttps://scholarbank.nus.edu.sg/handle/10635/104079Title: Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses
Authors: Chew, D.S.H.; Choi, K.P.; Leung, M.-Y.
Abstract: Many empirical studies show that there are unusual clusters of palindromes, closely spaced direct and inverted repeats around the replication origins of herpesviruses. In this paper, we introduce two new scoring schemes to quantify the spatial abundance of palindromes in a genomic sequence. Based on these scoring schemes, a computational method to predict the locations of replication origins is developed. When our predictions are compared with 39 known or annotated replication origins in 19 herpesviruses, close to 80% of the replication origins are located within 2% of the genome length. A list of predicted locations of replication origins in all the known herpesviruses with complete genome sequences is reported. © The Author 2005. Published by Oxford University Press. All rights reserved.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1040792005-01-01T00:00:00Z
- Limit theorems for functions of marginal quantileshttps://scholarbank.nus.edu.sg/handle/10635/105196Title: Limit theorems for functions of marginal quantiles
Authors: Babu, G.J.; Bai, Z.; Choi, K.P.; Mangalam, V.
Abstract: Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In particular, it is shown that √ n(1/nσ n i=1φ(X(1) n : i, ⋯ , X(d) n : i) - ȳ)=1/√nσn i=1 Zn,i + oP (1) as n→ ∞, where ȳ is a constant and Zn,i are i.i.d. random variables for each n. This leads to the central limit theorem. Weak convergence to a Gaussian process using equicontinuity of functions is indicated. The results are established under very general conditions. These conditions are shown to be satisfied in many commonly occurring situations. © 2011 ISI/BS.
Sun, 01 May 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1051962011-05-01T00:00:00Z
- Promoter profiling and coexpression data analysis identifies 24 novel genes that are coregulated with AMPA receptor genes, GRIAshttps://scholarbank.nus.edu.sg/handle/10635/103975Title: Promoter profiling and coexpression data analysis identifies 24 novel genes that are coregulated with AMPA receptor genes, GRIAs
Authors: Chong, A.; Zhang, Z.; Choi, K.P.; Choudhary, V.; Djamgoz, M.B.A.; Zhang, G.; Bajic, V.B.
Abstract: We identified a set of transcriptional elements that are conserved and overrepresented within the promoters of human, mouse, and rat GRIAs by comparing these promoters against a collection of 10,741 gene promoters. Cells regulate functional groups of genes by coordinating the transcriptional and/or posttranscriptional mRNA levels of interacting genes. As such, it is expected that functional groups of genes share the same transcriptional features within their promoters. We found 47 genes whose promoters contain the same combination of transcriptional elements that are overrepresented within the promoters of the GRIA gene family. Coexpressed genes may be transcriptionally coregulated, which in turn suggests that these genes may play complementary roles within a particular functional context. Using microarray expression data, we found 24 (of the 47) genes that share not only a similar promoter profile with GRIAs but also a well-correlated gene expression profile and, thus, we believe these to be coregulated with GRIAs. © 2007 Elsevier Inc. All rights reserved.
Thu, 01 Mar 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1039752007-03-01T00:00:00Z
- Nonrandom clusters of palindromes in herpesvirus genomeshttps://scholarbank.nus.edu.sg/handle/10635/103630Title: Nonrandom clusters of palindromes in herpesvirus genomes
Authors: Leung, M.-Y.; Kwok, P.C.; Xia, A.; Chen, L.H.Y.
Abstract: Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins. © Mary Ann Liebert, Inc.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1036302005-01-01T00:00:00Z
- A Remark on the Inverse Hölder Inequalityhttps://scholarbank.nus.edu.sg/handle/10635/102749Title: A Remark on the Inverse Hölder Inequality
Authors: Choi, K.P.
Mon, 01 Nov 1993 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1027491993-11-01T00:00:00Z
- A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays.https://scholarbank.nus.edu.sg/handle/10635/43062Title: A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays.
Authors: Ning, K.; Choi, K.P.; Leong, H.W.; Zhang, L.
Abstract: The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos. Our experimental results on both simulated and 52 real datasets show that no single method consistently gives the best synthesis strategy, and post-processing an existing strategy is necessary as it often reduces the number of synthesis cycles further.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/430622005-01-01T00:00:00Z
- Degree distribution of large networks generated by the partial duplication modelhttps://scholarbank.nus.edu.sg/handle/10635/103118Title: Degree distribution of large networks generated by the partial duplication model
Authors: Li, S.; Choi, K.P.; Wu, T.
Abstract: In this paper, we present a rigorous analysis on the limiting behavior of the degree distribution of the partial duplication model, a random network growth model in the duplication and divergence family that is popular in the study of biological networks. We show that for each non-negative integer k, the expected proportion of nodes of degree k approaches a limit as the network becomes large. This fills in a gap in previous studies. In addition, we prove that p=1/2, where p is the selection probability of the model, is the phase transition for the expected proportion of isolated nodes converging to 1, and hence answer a question raised in Bebek et al. [G. Bebek, P. Berenbrink, C. Cooper, T. Friedetzky, J. Nadeau, S.C. Sahinalp, The degree distribution of the generalized duplication model, Theoret. Comput. Sci. 369 (2006) 239-249]. We also obtain asymptotic bounds on the convergence rates of degree distribution. Since the observed networks typically do not contain isolated nodes, we study the subgraph consisting of all non-isolated nodes contained in the networks generated by the partial duplication model, and show that p=1/2 is again a phase transition for the limiting behavior of its degree distribution.
Mon, 11 Mar 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1031182013-03-11T00:00:00Z
- Approximating the number of successes in independent trials: Binomial versus poissonhttps://scholarbank.nus.edu.sg/handle/10635/102873Title: Approximating the number of successes in independent trials: Binomial versus poisson
Authors: Choi, K.P.; Xia, A.
Abstract: Let I1, I2,..., In be independent Bernoulli random variables with ℙ(Ii = 1) = 1 - ℙ(I i = 0) = pi, 1 ≤ i ≤ n, and W = ∑ i=1 n Ii, λ = double struct E sign W = ∑i=1 n pi. It is well known that if p i's are the same, then W follows a binomial distribution and if pi's are small, then the distribution of W, denoted by ℒW, can be well approximated by the Poisson(λ). Define r = ⌊λ⌋, the greatest integer ≤ λ, and set δ = λ - ⌊λ⌋, and κ be the least integer more than or equal to max{λ2/(r - 1 - (1 + δ)2),n}. In this paper, we prove that, if r > 1 + (1 + δ)2, then d κ < dκ+1 < dκ+2 < ⋯< dTV(ℒW, Poisson(λ)), where dTV denotes the total variation metric and dm = dTV(ℒW, Bi(m, λ/m)), m ≥ κ. Hence, in modelling the distribution of the sum of Bernoulli trials, Binomial approximation is generally better than Poisson approximation.
Fri, 01 Nov 2002 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1028732002-11-01T00:00:00Z
- Identifying co-regulating microrna groupshttps://scholarbank.nus.edu.sg/handle/10635/105172Title: Identifying co-regulating microrna groups
Authors: An, J.; Choi, K.P.; Wells, C.A.; Chen, Y.-P.P.
Abstract: Background: Current miRNA target prediction tools have the common problem that their false positive rate is high. This renders identification of co-regulating groups of miRNAs and target genes unreliable. In this study, we describe a procedure to identify highly probable co-regulating miRNAs and the corresponding co-regulated gene groups. Our procedure involves a sequence of statistical tests: (1) identify genes that are highly probable miRNA targets; (2) determine for each such gene, the minimum number of miRNAs that co-regulate it with high probability; (3) find, for each such gene, the combination of the determined minimum size of miRNAs that co-regulate it with the lowest p-value; and (4) discover for each such combination of miRNAs, the group of genes that are co-regulated by these miRNAs with the lowest p-value computed based on GO term annotations of the genes. Results: Our method identifies 4, 3 and 2-term miRNA groups that co-regulate gene groups of size at least 3 in human. Our result suggests some interesting hypothesis on the functional role of several miRNAs through a "guilt by association" reasoning. For example, miR-130, miR-19 and miR-101 are known neurodegenerative diseases associated miRNAs. Our 3-term miRNA table shows that miR-130/19/101 form a co-regulating group of rank 22 (p-value =1.16 × 10-2). Since miR-144 is co-regulating with miR-130, miR-19 and miR-101 of rank 4 (p-value = 1.16 × 10-2) in our 4-term miRNA table, this suggests hsa-miR-144 may be neurodegenerative diseases related miRNA. Conclusions: This work identifies highly probable co-regulating miRNAs, which are refined from the prediction by computational tools using (1) signal-to-noise ratio to get high accurate regulating miRNAs for every gene, and (2) Gene Ontology to obtain functional related co-regulating miRNA groups. Our result has partly been supported by biological experiments. Based on prediction by TargetScanS, we found highly probable target gene groups in the Supplementary Information. This result might help biologists to find small set of miRNAs for genes of interest rather than huge amount of miRNA set. Supplementary Information: . © 2010 Imperial College Press.
Mon, 01 Feb 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1051722010-02-01T00:00:00Z
- A commentary on the logistic distributionhttps://scholarbank.nus.edu.sg/handle/10635/105480Title: A commentary on the logistic distribution
Authors: Ghosh, M.; Choi, K.P.; Li, J.
Abstract: The paper provides a series representation of the logistic probability density function in terms of differently scaled double exponential distributions with terms of the series alternating in signs. This representation is used to calculate moments, moment generating function, and characteristic function of a logistic distribution. The same representation is also used to derive the logistic distribution as the scale mixture of a normal distribution. © 2010 Springer Science+Business Media, LLC.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054802010-01-01T00:00:00Z
- Iterative piecewise linear regression to accurately assess statistical significance in batch confounded differential expression analysishttps://scholarbank.nus.edu.sg/handle/10635/105473Title: Iterative piecewise linear regression to accurately assess statistical significance in batch confounded differential expression analysis
Authors: Li, J.; Choi, K.P.; Karuturi, R.K.M.
Abstract: Batch dependent variation in microarray experiments may be manifested through systematic shift in expression measurements from batch to batch. Such a systematic shift could be taken care of by using an appropriate model for differential expression analysis. However, it poses greater challenge in the estimation of statistical significance and false discovery rate (FDR), if the batches are confounded (collinear) with the biological groups of interest. Batch confounding problem occurs commonly in the analysis of time-course data or data from different laboratories. We demonstrate that batch confounding may lead to incorrect estimation of the expected statistics. In this paper, we propose an iterative piecewise linear regression (iPLR) method, a major extension of our previously published Stepped Linear Regression (SLR) method, in the context of SAM to re-estimate the expected statistics and FDR. iPLR can be applied to one-sided or two-sided statistics based tests. We demonstrate the efficacy of iPLR on both simulated and real microarray datasets. iPLR also provides a better interpretation of the linear model parameters. © 2012 Springer-Verlag.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054732012-01-01T00:00:00Z
- A non-uniform bound for translated poisson approximationhttps://scholarbank.nus.edu.sg/handle/10635/102700Title: A non-uniform bound for translated poisson approximation
Authors: Barbour, A.D.; Choi, K.P.
Abstract: Let X1,..., Xn be independent, integer valued random variables, with pth moments, p > 2, and let W denote their sum. We prove bounds analogous to the classical non-uniform estimates of the error in the central limit theorem, but now, for approximation of L(W) by a translated Poisson distribution. The advantage is that the error bounds, which are often of order no worse than in the classical case, measure the accuracy in terms of total variation distance. In order to have good approximation in this sense, it is necessary for L(W) to be sufficiently smooth; this requirement is incorporated into the bounds by way of a parameter α, which measures the average overlap between L(Xi) and L(Xi + 1), 1 ≤ i ≤ n.
Wed, 04 Feb 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1027002004-02-04T00:00:00Z
- Redhyte: a self-diagnosing, self-correcting, and helpful hypothesis analysis platformhttps://scholarbank.nus.edu.sg/handle/10635/140932Title: Redhyte: a self-diagnosing, self-correcting, and helpful hypothesis analysis platform
Authors: Wei Zhong Toh; Kwok Pui Choi; Limsoon Wong
Thu, 20 Jul 2017 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1409322017-07-20T00:00:00Z
- Reconstruction of network evolutionary history from extant network topology and duplication historyhttps://scholarbank.nus.edu.sg/handle/10635/53315Title: Reconstruction of network evolutionary history from extant network topology and duplication history
Authors: Li, S.; Choi, K.P.; Wu, T.; Zhang, L.
Abstract: Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature. © 2012 Springer-Verlag.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/533152012-01-01T00:00:00Z
- Counting motifs in the entire biological network from noisy and incomplete data (extended abstract)https://scholarbank.nus.edu.sg/handle/10635/53282Title: Counting motifs in the entire biological network from noisy and incomplete data (extended abstract)
Authors: Tran, N.H.; Choi, K.P.; Zhang, L.
Abstract: Small over-represented motifs in biological networks are believed to represent essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Given that high-throughput biotechnology is only able to interrogate a portion of the entire biological network with non-negligible errors, we develop a powerful method to correct link errors in estimating undirected or directed motif counts in the entire network from noisy subnetwork data. © 2013 Springer-Verlag.
Tue, 01 Jan 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/532822013-01-01T00:00:00Z
- Profiling the transcription factor regulatory networks of human cell typeshttps://scholarbank.nus.edu.sg/handle/10635/156499Title: Profiling the transcription factor regulatory networks of human cell types
Authors: Zhang, Shihua; Tian, Dechao; Ngoc, Hieu Tran; Choi, Kwok Pui; Zhang, Louxin
Abstract: © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in different human cells. In this paper, the architectures of the 41 regulatory networks and the distributions of housekeeping and specific regulatory interactions are investigated. The TF regulatory networks of different human cell types demonstrate similar global three-layer (top, core and bottom) hierarchical architectures, which are greatly different from the yeast TF regulatory network. However, they have distinguishable local organizations, as suggested by the fact that wiring patterns of only a few TFs are enough to distinguish cell identities. The TF regulatory network of human embryonic stem cells (hESCs) is dense and enriched with interactions that are unseen in the networks of other cell types. The examination of specific regulatory interactions suggests that specific interactions play important roles in hESCs.
Mon, 10 Nov 2014 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1564992014-11-10T00:00:00Z
- Counting motifs in the human interactomehttps://scholarbank.nus.edu.sg/handle/10635/156500Title: Counting motifs in the human interactome
Authors: Ngoc, Hieu Tran; Choi, Kwok Pui; Zhang, Louxin
Abstract: Small over-represented motifs in biological networks often form essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Here we develop an accurate method to estimate the occurrences of a motif in the entire network from noisy and incomplete data, and apply it to eukaryotic interactomes and cell-specific transcription factor regulatory networks. The number of triangles in the human interactome is about 194 times that in the Saccharomyces cerevisiae interactome. A strong positive linear correlation exists between the numbers of occurrences of triad and quadriad motifs in human cell-specific transcription factor regulatory networks. Our findings show that the proposed method is general and powerful for counting motifs and can be applied to any network regardless of its topological structure. © 2013 Macmillan Publishers Limited. All rights reserved.
Thu, 01 Aug 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1565002013-08-01T00:00:00Z
- Spot urine estimations are equivalent to 24-hour urine assessments of urine protein excretion for predicting clinical outcomeshttps://scholarbank.nus.edu.sg/handle/10635/145965Title: Spot urine estimations are equivalent to 24-hour urine assessments of urine protein excretion for predicting clinical outcomes
Authors: Teo B.W.; Loh P.T.; Wong W.K.; Ho P.J.; Choi K.P.; Toh Q.C.; Xu H.; Saw S.; Lau T.; Sethi S.; Lee E.J.C.
Thu, 01 Jan 2015 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1459652015-01-01T00:00:00Z
- Counting motifs in the entire biological network from noisy and incomplete data (extended abstract)https://scholarbank.nus.edu.sg/handle/10635/104554Title: Counting motifs in the entire biological network from noisy and incomplete data (extended abstract)
Authors: Tran, N.H.; Choi, K.P.; Zhang, L.
Abstract: Small over-represented motifs in biological networks are believed to represent essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Given that high-throughput biotechnology is only able to interrogate a portion of the entire biological network with non-negligible errors, we develop a powerful method to correct link errors in estimating undirected or directed motif counts in the entire network from noisy subnetwork data. © 2013 Springer-Verlag.
Tue, 01 Jan 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1045542013-01-01T00:00:00Z
- Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses.https://scholarbank.nus.edu.sg/handle/10635/104080Title: Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses.
Authors: Chew, D.S.; Choi, K.P.; Leung, M.Y.
Abstract: Many empirical studies show that there are unusual clusters of palindromes, closely spaced direct and inverted repeats around the replication origins of herpesviruses. In this paper, we introduce two new scoring schemes to quantify the spatial abundance of palindromes in a genomic sequence. Based on these scoring schemes, a computational method to predict the locations of replication origins is developed. When our predictions are compared with 39 known or annotated replication origins in 19 herpesviruses, close to 80% of the replication origins are located within 2% of the genome length. A list of predicted locations of replication origins in all the known herpesviruses with complete genome sequences is reported.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1040802005-01-01T00:00:00Z
- Some best possible prophet inequalities for convex functions of sums of independent variates and unordered martingale difference sequenceshttps://scholarbank.nus.edu.sg/handle/10635/104154Title: Some best possible prophet inequalities for convex functions of sums of independent variates and unordered martingale difference sequences
Authors: Choi, K.P.; Klass, M.J.
Abstract: Let Φ(·) be a nondecreasing convex function on [0, ∞). We show that for any integer n ≥ 1 and real a, EΦ((Mn - a)+) ≤ 2EΦ((Sn - a)+) - Φ(0) and E(Mn ∨ med Sn) ≤ E|Sn -med Sn|. where X1, X2, . . . are any independent mean zero random variables with partial sums S0 = 0, Sk = X1 + . . . + Xk and partial sum maxima Mn = max0≤k≤nSk. There are various instances in which these inequalities are best possible for fixed n and/or as n → ∞. These inequalities remain valid if {Xk} is a martingale difference sequence such that E(Xk | {Xi: i ≠ k}) = 0 a.s. for each k ≥ 1. Modified versions of these inequalities hold if the variates have arbitrary means but are independent.
Tue, 01 Apr 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1041541997-04-01T00:00:00Z
- Good spaced seeds for homology searchhttps://scholarbank.nus.edu.sg/handle/10635/104660Title: Good spaced seeds for homology search
Authors: Choi, K.P.; Zeng, F.; Zhang, L.
Abstract: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma, Tromp and Li (2002) discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1046602004-01-01T00:00:00Z
- Reconstruction of network evolutionary history from extant network topology and duplication historyhttps://scholarbank.nus.edu.sg/handle/10635/104618Title: Reconstruction of network evolutionary history from extant network topology and duplication history
Authors: Li, S.; Choi, K.P.; Wu, T.; Zhang, L.
Abstract: Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature. © 2012 Springer-Verlag.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1046182012-01-01T00:00:00Z