ScholarBank@NUS

Degree distribution of large networks generated by the partial duplication model

Mon, 11 Mar 2013 00:00:00 GMT

Title: Degree distribution of large networks generated by the partial duplication model Authors: Li, S.; Choi, K.P.; Wu, T. Abstract: In this paper, we present a rigorous analysis on the limiting behavior of the degree distribution of the partial duplication model, a random network growth model in the duplication and divergence family that is popular in the study of biological networks. We show that for each non-negative integer k, the expected proportion of nodes of degree k approaches a limit as the network becomes large. This fills in a gap in previous studies. In addition, we prove that p=1/2, where p is the selection probability of the model, is the phase transition for the expected proportion of isolated nodes converging to 1, and hence answer a question raised in Bebek et al. [G. Bebek, P. Berenbrink, C. Cooper, T. Friedetzky, J. Nadeau, S.C. Sahinalp, The degree distribution of the generalized duplication model, Theoret. Comput. Sci. 369 (2006) 239-249]. We also obtain asymptotic bounds on the convergence rates of degree distribution. Since the observed networks typically do not contain isolated nodes, we study the subgraph consisting of all non-isolated nodes contained in the networks generated by the partial duplication model, and show that p=1/2 is again a phase transition for the limiting behavior of its degree distribution.

Redhyte: a self-diagnosing, self-correcting, and helpful hypothesis analysis platform

Thu, 20 Jul 2017 00:00:00 GMT

Title: Redhyte: a self-diagnosing, self-correcting, and helpful hypothesis analysis platform Authors: Wei Zhong Toh; Kwok Pui Choi; Limsoon Wong

A commentary on the logistic distribution

Fri, 01 Jan 2010 00:00:00 GMT

Title: A commentary on the logistic distribution Authors: Ghosh, M.; Choi, K.P.; Li, J. Abstract: The paper provides a series representation of the logistic probability density function in terms of differently scaled double exponential distributions with terms of the series alternating in signs. This representation is used to calculate moments, moment generating function, and characteristic function of a logistic distribution. The same representation is also used to derive the logistic distribution as the scale mixture of a normal distribution. © 2010 Springer Science+Business Media, LLC.

Iterative piecewise linear regression to accurately assess statistical significance in batch confounded differential expression analysis

Sun, 01 Jan 2012 00:00:00 GMT

Title: Iterative piecewise linear regression to accurately assess statistical significance in batch confounded differential expression analysis Authors: Li, J.; Choi, K.P.; Karuturi, R.K.M. Abstract: Batch dependent variation in microarray experiments may be manifested through systematic shift in expression measurements from batch to batch. Such a systematic shift could be taken care of by using an appropriate model for differential expression analysis. However, it poses greater challenge in the estimation of statistical significance and false discovery rate (FDR), if the batches are confounded (collinear) with the biological groups of interest. Batch confounding problem occurs commonly in the analysis of time-course data or data from different laboratories. We demonstrate that batch confounding may lead to incorrect estimation of the expected statistics. In this paper, we propose an iterative piecewise linear regression (iPLR) method, a major extension of our previously published Stepped Linear Regression (SLR) method, in the context of SAM to re-estimate the expected statistics and FDR. iPLR can be applied to one-sided or two-sided statistics based tests. We demonstrate the efficacy of iPLR on both simulated and real microarray datasets. iPLR also provides a better interpretation of the linear model parameters. © 2012 Springer-Verlag.

Approximating the number of successes in independent trials: Binomial versus poisson

Fri, 01 Nov 2002 00:00:00 GMT

Title: Approximating the number of successes in independent trials: Binomial versus poisson Authors: Choi, K.P.; Xia, A. Abstract: Let I1, I2,..., In be independent Bernoulli random variables with ℙ(Ii = 1) = 1 - ℙ(I i = 0) = pi, 1 ≤ i ≤ n, and W = ∑ i=1 n Ii, λ = double struct E sign W = ∑i=1 n pi. It is well known that if p i's are the same, then W follows a binomial distribution and if pi's are small, then the distribution of W, denoted by ℒW, can be well approximated by the Poisson(λ). Define r = ⌊λ⌋, the greatest integer ≤ λ, and set δ = λ - ⌊λ⌋, and κ be the least integer more than or equal to max{λ2/(r - 1 - (1 + δ)2),n}. In this paper, we prove that, if r > 1 + (1 + δ)2, then d κ < dκ+1 < dκ+2 < ⋯< dTV(ℒW, Poisson(λ)), where dTV denotes the total variation metric and dm = dTV(ℒW, Bi(m, λ/m)), m ≥ κ. Hence, in modelling the distribution of the sum of Bernoulli trials, Binomial approximation is generally better than Poisson approximation.

Maximum likelihood inference of the evolutionary history of a PPI network from the duplication history of its proteins

Fri, 01 Nov 2013 00:00:00 GMT

Title: Maximum likelihood inference of the evolutionary history of a PPI network from the duplication history of its proteins Authors: Li, S.; Choi, K.P.; Wu, T.; Zhang, L. Abstract: Evolutionary history of protein-protein interaction (PPI) networks provides valuable insight into molecular mechanisms of network growth. In this paper, we study how to infer the evolutionary history of a PPI network from its protein duplication relationship. We show that for a plausible evolutionary history of a PPI network, its relative quality, measured by the so-called loss number, is independent of the growth parameters of the network and can be computed efficiently. This finding leads us to propose two fast maximum likelihood algorithms to infer the evolutionary history of a PPI network given the duplication history of its proteins. Simulation studies demonstrated that our approach, which takes advantage of protein duplication information, outperforms NetArch, the first maximum likelihood algorithm for PPI network history reconstruction. Using the proposed method, we studied the topological change of the PPI networks of the yeast, fruitfly, and worm. © 2013 IEEE.

A Remark on the Inverse Hölder Inequality

Mon, 01 Nov 1993 00:00:00 GMT

Title: A Remark on the Inverse Hölder Inequality Authors: Choi, K.P.

Counting motifs in the entire biological network from noisy and incomplete data (extended abstract)

Tue, 01 Jan 2013 00:00:00 GMT

Title: Counting motifs in the entire biological network from noisy and incomplete data (extended abstract) Authors: Tran, N.H.; Choi, K.P.; Zhang, L. Abstract: Small over-represented motifs in biological networks are believed to represent essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Given that high-throughput biotechnology is only able to interrogate a portion of the entire biological network with non-negligible errors, we develop a powerful method to correct link errors in estimating undirected or directed motif counts in the entire network from noisy subnetwork data. © 2013 Springer-Verlag.

A non-uniform bound for translated poisson approximation

Wed, 04 Feb 2004 00:00:00 GMT

Title: A non-uniform bound for translated poisson approximation Authors: Barbour, A.D.; Choi, K.P. Abstract: Let X1,..., Xn be independent, integer valued random variables, with pth moments, p > 2, and let W denote their sum. We prove bounds analogous to the classical non-uniform estimates of the error in the central limit theorem, but now, for approximation of L(W) by a translated Poisson distribution. The advantage is that the error bounds, which are often of order no worse than in the classical case, measure the accuracy in terms of total variation distance. In order to have good approximation in this sense, it is necessary for L(W) to be sufficiently smooth; this requirement is incorporated into the bounds by way of a parameter α, which measures the average overlap between L(Xi) and L(Xi + 1), 1 ≤ i ≤ n.

Nonrandom clusters of palindromes in herpesvirus genomes

Sat, 01 Jan 2005 00:00:00 GMT

Title: Nonrandom clusters of palindromes in herpesvirus genomes Authors: Leung, M.-Y.; Kwok, P.C.; Xia, A.; Chen, L.H.Y. Abstract: Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins. © Mary Ann Liebert, Inc.

Reconstruction of network evolutionary history from extant network topology and duplication history

Sun, 01 Jan 2012 00:00:00 GMT

Title: Reconstruction of network evolutionary history from extant network topology and duplication history Authors: Li, S.; Choi, K.P.; Wu, T.; Zhang, L. Abstract: Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature. © 2012 Springer-Verlag.

Degree distribution of large networks generated by the partial duplication model

Mon, 11 Mar 2013 00:00:00 GMT

Some best possible prophet inequalities for convex functions of sums of independent variates and unordered martingale difference sequences

Tue, 01 Apr 1997 00:00:00 GMT

Title: Some best possible prophet inequalities for convex functions of sums of independent variates and unordered martingale difference sequences Authors: Choi, K.P.; Klass, M.J. Abstract: Let Φ(·) be a nondecreasing convex function on [0, ∞). We show that for any integer n ≥ 1 and real a, EΦ((Mn - a)+) ≤ 2EΦ((Sn - a)+) - Φ(0) and E(Mn ∨ med Sn) ≤ E|Sn -med Sn|. where X1, X2, . . . are any independent mean zero random variables with partial sums S0 = 0, Sk = X1 + . . . + Xk and partial sum maxima Mn = max0≤k≤nSk. There are various instances in which these inequalities are best possible for fixed n and/or as n → ∞. These inequalities remain valid if {Xk} is a martingale difference sequence such that E(Xk | {Xi: i ≠ k}) = 0 a.s. for each k ≥ 1. Modified versions of these inequalities hold if the variates have arbitrary means but are independent.

Palindromes in SARS and other coronaviruses

Wed, 01 Sep 2004 00:00:00 GMT

Title: Palindromes in SARS and other coronaviruses Authors: Chew, D.S.H.; Choi, K.P.; Heidner, H.; Leung, M.-Y. Abstract: With the identification of a novel coronavirus associated with the severe acute respiratory syndrome (SARS), computational analysis of its RNA genome sequence is expected to give useful clues to help elucidate the origin, evolution, and pathogenicity of the virus. In this paper, we study the collective counts of palindromes in the SARS genome along with all the completely sequenced coronaviruses. Based on a Markov-chain model for the genome sequence, the mean and standard deviation for the number of palindromes at or above a given length are derived. These theoretical results are complemented by extensive simulations to provide empirical estimates. Using a z score obtained from these mathematical and empirical means and standard deviations, we have observed that palindromes of length four are significantly underrepresented in all the coronaviruses in our data set. In contrast, length-six palindromes are significantly underrepresented only in the SARS coronavirus. Two other features are unique to the SARS sequence. First, there is a length-22 palindrome TCTTTAACAAGCTTGTTAAAGA spanning positions 25962-25983. Second, there are two repeating length-12 palindromes TTATAATTATAA spanning positions 22712-22723 and 22796-22807. Some further investigations into possible biological implications of these palindrome features are proposed.

Promoter profiling and coexpression data analysis identifies 24 novel genes that are coregulated with AMPA receptor genes, GRIAs

Thu, 01 Mar 2007 00:00:00 GMT

Title: Promoter profiling and coexpression data analysis identifies 24 novel genes that are coregulated with AMPA receptor genes, GRIAs Authors: Chong, A.; Zhang, Z.; Choi, K.P.; Choudhary, V.; Djamgoz, M.B.A.; Zhang, G.; Bajic, V.B. Abstract: We identified a set of transcriptional elements that are conserved and overrepresented within the promoters of human, mouse, and rat GRIAs by comparing these promoters against a collection of 10,741 gene promoters. Cells regulate functional groups of genes by coordinating the transcriptional and/or posttranscriptional mRNA levels of interacting genes. As such, it is expected that functional groups of genes share the same transcriptional features within their promoters. We found 47 genes whose promoters contain the same combination of transcriptional elements that are overrepresented within the promoters of the GRIA gene family. Coexpressed genes may be transcriptionally coregulated, which in turn suggests that these genes may play complementary roles within a particular functional context. Using microarray expression data, we found 24 (of the 47) genes that share not only a similar promoter profile with GRIAs but also a well-correlated gene expression profile and, thus, we believe these to be coregulated with GRIAs. © 2007 Elsevier Inc. All rights reserved.

Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses.

Sat, 01 Jan 2005 00:00:00 GMT

Title: Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses. Authors: Chew, D.S.; Choi, K.P.; Leung, M.Y. Abstract: Many empirical studies show that there are unusual clusters of palindromes, closely spaced direct and inverted repeats around the replication origins of herpesviruses. In this paper, we introduce two new scoring schemes to quantify the spatial abundance of palindromes in a genomic sequence. Based on these scoring schemes, a computational method to predict the locations of replication origins is developed. When our predictions are compared with 39 known or annotated replication origins in 19 herpesviruses, close to 80% of the replication origins are located within 2% of the genome length. A list of predicted locations of replication origins in all the known herpesviruses with complete genome sequences is reported.

Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses

Sat, 01 Jan 2005 00:00:00 GMT

Title: Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses Authors: Chew, D.S.H.; Choi, K.P.; Leung, M.-Y. Abstract: Many empirical studies show that there are unusual clusters of palindromes, closely spaced direct and inverted repeats around the replication origins of herpesviruses. In this paper, we introduce two new scoring schemes to quantify the spatial abundance of palindromes in a genomic sequence. Based on these scoring schemes, a computational method to predict the locations of replication origins is developed. When our predictions are compared with 39 known or annotated replication origins in 19 herpesviruses, close to 80% of the replication origins are located within 2% of the genome length. A list of predicted locations of replication origins in all the known herpesviruses with complete genome sequences is reported. © The Author 2005. Published by Oxford University Press. All rights reserved.

Sensitivity analysis and efficient method for identifying optimal spaced seeds

Sun, 01 Feb 2004 00:00:00 GMT

Title: Sensitivity analysis and efficient method for identifying optimal spaced seeds Authors: Choi, K.P.; Zhang, L. Abstract: The novel introduction of spaced seed idea in the filtration stage of sequence comparison by Ma et al. (Bioinformatics 18 (2002) 440) has greatly increased the sensitivity of homology search without compromising the speed of search. Finding the optimal spaced seeds is of great importance both theoretically and in designing better search tool for sequence comparison. In this paper, we study the computational aspects of calculating the hitting probability of spaced seeds; and based on these results, we propose an efficient algorithm for identifying optimal spaced seeds.

A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays

Sat, 01 Jan 2005 00:00:00 GMT

Title: A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays Authors: Ning, K.; Choi, K.P.; Leong, H.W.; Zhang, L. Abstract: The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos. Our experimental results on both simulated and 52 real datasets show that no single method consistently gives the best synthesis strategy, and post-processing an existing strategy is necessary as it often reduces the number of synthesis cycles further. © The Author 2005. Published by Oxford University Press. All rights reserved.

Good spaced seeds for homology search

Sat, 01 May 2004 00:00:00 GMT

Title: Good spaced seeds for homology search Authors: Choi, K.P.; Zeng, F.; Zhang, L. Abstract: Motivation: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. Results: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences. © Oxford University Press 2004; all rights reserved.

Spectrum-based de novo repeat detection in genomic sequences

Tue, 01 Jan 2008 00:00:00 GMT

Title: Spectrum-based de novo repeat detection in genomic sequences Authors: Do, H.H.; Choi, K.P.; Preparata, F.P.; Sung, W.K.; Zhang, L. Abstract: A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg. © Mary Ann Liebert, Inc. 2008.

Good spaced seeds for homology search

Thu, 01 Jan 2004 00:00:00 GMT

Title: Good spaced seeds for homology search Authors: Choi, K.P.; Zeng, F.; Zhang, L. Abstract: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma, Tromp and Li (2002) discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences.

Quick, practical selection of effective seeds for homology search

Tue, 01 Nov 2005 00:00:00 GMT

Title: Quick, practical selection of effective seeds for homology search Authors: Preparata, F.P.; Zhang, L.; Choi, K.P. Abstract: It has been observed that in homology search gapped seeds have better sensitivity than ungapped ones for the same cost (weight). In this paper, we propose a probability leakage model (a dissipative Markov system) to elucidate the mechanism that confers power to spaced seeds. Based on this model, we identify desirable features of gapped search seeds and formulate an extremely efficient procedure for seed design: it samples from the set of spaced seed exhibiting those features, evaluates their sensitivity, and then selects the best. The sensitivity of the constructed seeds is negligibly less than that of the corresponding known optimal seeds. While the challenging mathematical question of characterizing optimal search seeds remains open, we believe that our eminently efficient and effective approach represents a satisfactory solution from a practitioner's viewpoint. © Mary Ann Liebert, Inc.

Exact and approximate computation of critical values of largest root test in high dimension

Tue, 23 Mar 2021 00:00:00 GMT

Title: Exact and approximate computation of critical values of largest root test in high dimension Authors: Gregory Ang; Zhidong Bai; Kwok Pui Choi; Yasunori Fujikoshi; Jiang Hu

Plasma metabolome and lipidome associations with type 2 diabetes and diabetic nephropathy

Thu, 08 Apr 2021 00:00:00 GMT

Title: Plasma metabolome and lipidome associations with type 2 diabetes and diabetic nephropathy Authors: Tan, Yan Ming; Gao, Yan; Teo, Guoshou; Koh, Hiromi W. L.; Tai, E. Shyong; Khoo, Chin Meng; Choi, Kwok Pui; Zhou, Lei; Choi, Hyungwon Abstract: We conducted untargeted metabolomics analysis of plasma samples from a cross-sectional case–control study with 30 healthy controls, 30 patients with diabetes mellitus and normal renal function (DM-N), and 30 early diabetic nephropathy (DKD) patients using liquid chromatography-mass spectrometry (LC-MS). We employed two different modes of MS acquisition on a high-resolution MS instrument for identification and semi-quantification, and analyzed data using an advanced multivariate method for prioritizing differentially abundant metabolites. We obtained semi-quantification data for 1088 unique compounds (~55% lipids), excluding compounds that may be either exogenous compounds or treated as medication. Supervised classification analysis over a confounding-free partial correlation network shows that prostaglandins, phospholipids, nucleotides, sugars, and glycans are elevated in the DM-N and DKD patients, whereas glutamine, phenylacetylglutamine, 3-indoxyl sulfate, acetylphenylalanine, xanthine, dimethyluric acid, and asymmetric dimethylarginine are increased in DKD compared to DM-N. The data recapitulate the well-established plasma metabolome changes associated with DM-N and suggest uremic solutes and oxidative stress markers as the compounds indicating early renal function decline in DM patients. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Erratum: Sharp Bounds and Normalization of Wiener-Type Indices (PLoS ONE 8:11 (e78448) DOI:10.1371/journal.pone.0078448)

Wed, 01 Jan 2014 00:00:00 GMT

Title: Erratum: Sharp Bounds and Normalization of Wiener-Type Indices (PLoS ONE 8:11 (e78448) DOI:10.1371/journal.pone.0078448) Authors: Tian, Dechao; Choi, Kwok Pui

A STATISTICAL APPROACH TO ADAPTIVE PARAMETER TUNING IN NATURE-INSPIRED OPTIMIZATION AND OPTIMAL SEQUENTIAL DESIGN OF DOSE-FINDING TRIALS

Fri, 01 Oct 2021 00:00:00 GMT

Title: A STATISTICAL APPROACH TO ADAPTIVE PARAMETER TUNING IN NATURE-INSPIRED OPTIMIZATION AND OPTIMAL SEQUENTIAL DESIGN OF DOSE-FINDING TRIALS Authors: Choi, Kwok Pui; Lai, Tze Leung; TONG XIN; Wong, Weng Kee Abstract: Nature-inspired metaheuristic algorithms have become increasingly popular in the last couple of decades, and now constitute a major toolbox for tackling complex high-dimensional optimization problems. Using group sequential experimentation, adaptive design, multi-armed bandits, and bootstrap resampling methods, this study develops a novel statistical methodology for efficient and systematic group sequential selection of the tuning parameters, which are widely recognized as pivotal to the success of metaheuristic optimization algorithms in practice, as new information accumulates during the course of an experiment. The methodology is applied to compute optimal experimental designs in nonlinear regression models, and is illustrated with solutions of long-standing optimal design problems in early-phase dose-finding oncology trials.

Appropriate noise addition to metaheuristic algorithms can enhance their performance

Fri, 01 Dec 2023 00:00:00 GMT

Title: Appropriate noise addition to metaheuristic algorithms can enhance their performance Authors: Choi, KP; Kam, EHH; Tong, XT; Wong, WK Abstract: Nature-inspired swarm-based algorithms are increasingly applied to tackle high-dimensional and complex optimization problems across disciplines. They are general purpose optimization algorithms, easy to implement and assumption-free. Some common drawbacks of these algorithms are their premature convergence and the solution found may not be a global optimum. We propose a general, simple and effective strategy, called heterogeneous Perturbation–Projection (HPP), to enhance an algorithm’s exploration capability so that our sufficient convergence conditions are guaranteed to hold and the algorithm converges almost surely to a global optimum. In summary, HPP applies stochastic perturbation on half of the swarm agents and then project all agents onto the set of feasible solutions. We illustrate this approach using three widely used nature-inspired swarm-based optimization algorithms: particle swarm optimization (PSO), bat algorithm (BAT) and Ant Colony Optimization for continuous domains (ACO). Extensive numerical experiments show that the three algorithms with the HPP strategy outperform the original versions with 60–80% the times with significant margins.

Multistressed families in Singapore: A focus on transnational families

Sat, 01 Jun 2019 00:00:00 GMT

Title: Multistressed families in Singapore: A focus on transnational families Authors: Chiu, Marcus YL; Ghoh, Corinne; Chung, Gerard; Choi, Kwok P Abstract: Families under multiple stresses present a challenge that requires coordinated multiple helping hands. Drawing on the baseline data, this paper profiles >200 multistressed families (MF) who entered into a specific enhancement programme in Singapore and compares the sociodemographies, family functioning and resilience of the children between transnational and non-transnational families. Findings show these transnational families have significantly older fathers, greater age difference between spouses, more fathers unemployed, and have significantly more needs related to system barriers. Although their youths do not have a lower resilience when compared to the non-transnational group, the overall resilience level of the youths from MF is significantly lower than that of the normative youths. Family income and number of system needs are found significantly correlated with both family cohesion and family flexibility. Multilevel regression with variables controlled shows that being a male and those with high family flexibility will predict a better youth resilience. Discussion and recommendation are made on the unique context of Singapore and possible ways to improve family flexibility in the Asian context.

iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery

Tue, 09 Jul 2019 00:00:00 GMT

Title: iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery Authors: Koh, Hiromi WL; Fermin, Damian; Vogel, Christine; Choi, Kwok Pui; Ewing, Rob M; Choi, Hyungwon Abstract: Computational tools for multiomics data integration have usually been designed for unsupervised detection of multiomics features explaining large phenotypic variations. To achieve this, some approaches extract latent signals in heterogeneous data sets from a joint statistical error model, while others use biological networks to propagate differential expression signals and find consensus signatures. However, few approaches directly consider molecular interaction as a data feature, the essential linker between different omics data sets. The increasing availability of genome-scale interactome data connecting different molecular levels motivates a new class of methods to extract interactive signals from multiomics data. Here we developed iOmicsPASS, a tool to search for predictive subnetworks consisting of molecular interactions within and between related omics data types in a supervised analysis setting. Based on user-provided network data and relevant omics data sets, iOmicsPASS computes a score for each molecular interaction, and applies a modified nearest shrunken centroid algorithm to the scores to select densely connected subnetworks that can accurately predict each phenotypic group. iOmicsPASS detects a sparse set of predictive molecular interactions without loss of prediction accuracy compared to alternative methods, and the selected network signature immediately provides mechanistic interpretation of the multiomics profile representing each sample group. Extensive simulation studies demonstrate clear benefit of interaction-level modeling. iOmicsPASS analysis of TCGA/CPTAC breast cancer data also highlights new transcriptional regulatory network underlying the basal-like subtype as positive protein markers, a result not seen through analysis of individual omics data.

Identifying co-regulating microrna groups

Mon, 01 Feb 2010 00:00:00 GMT

Title: Identifying co-regulating microrna groups Authors: An, J.; Choi, K.P.; Wells, C.A.; Chen, Y.-P.P. Abstract: Background: Current miRNA target prediction tools have the common problem that their false positive rate is high. This renders identification of co-regulating groups of miRNAs and target genes unreliable. In this study, we describe a procedure to identify highly probable co-regulating miRNAs and the corresponding co-regulated gene groups. Our procedure involves a sequence of statistical tests: (1) identify genes that are highly probable miRNA targets; (2) determine for each such gene, the minimum number of miRNAs that co-regulate it with high probability; (3) find, for each such gene, the combination of the determined minimum size of miRNAs that co-regulate it with the lowest p-value; and (4) discover for each such combination of miRNAs, the group of genes that are co-regulated by these miRNAs with the lowest p-value computed based on GO term annotations of the genes. Results: Our method identifies 4, 3 and 2-term miRNA groups that co-regulate gene groups of size at least 3 in human. Our result suggests some interesting hypothesis on the functional role of several miRNAs through a "guilt by association" reasoning. For example, miR-130, miR-19 and miR-101 are known neurodegenerative diseases associated miRNAs. Our 3-term miRNA table shows that miR-130/19/101 form a co-regulating group of rank 22 (p-value =1.16 × 10-2). Since miR-144 is co-regulating with miR-130, miR-19 and miR-101 of rank 4 (p-value = 1.16 × 10-2) in our 4-term miRNA table, this suggests hsa-miR-144 may be neurodegenerative diseases related miRNA. Conclusions: This work identifies highly probable co-regulating miRNAs, which are refined from the prediction by computational tools using (1) signal-to-noise ratio to get high accurate regulating miRNAs for every gene, and (2) Gene Ontology to obtain functional related co-regulating miRNA groups. Our result has partly been supported by biological experiments. Based on prediction by TargetScanS, we found highly probable target gene groups in the Supplementary Information. This result might help biologists to find small set of miRNAs for genes of interest rather than huge amount of miRNA set. Supplementary Information: . © 2010 Imperial College Press.

Least-squares support vector machine approach to viral replication origin prediction

Tue, 01 Jun 2010 00:00:00 GMT

Title: Least-squares support vector machine approach to viral replication origin prediction Authors: Cruz-Cano, R.; Chew, D.S.H.; Choi, K.-P.; Leung, M.-Y. Abstract: Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications. © 2010 INFORMS.

Limit theorems for functions of marginal quantiles

Sun, 01 May 2011 00:00:00 GMT

Title: Limit theorems for functions of marginal quantiles Authors: Babu, G.J.; Bai, Z.; Choi, K.P.; Mangalam, V. Abstract: Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In particular, it is shown that √ n(1/nσ n i=1φ(X(1) n : i, ⋯ , X(d) n : i) - ȳ)=1/√nσn i=1 Zn,i + oP (1) as n→ ∞, where ȳ is a constant and Zn,i are i.i.d. random variables for each n. This leads to the central limit theorem. Weak convergence to a Gaussian process using equicontinuity of functions is indicated. The results are established under very general conditions. These conditions are shown to be satisfied in many commonly occurring situations. © 2011 ISI/BS.

On asymptotic joint distributions of cherries and pitchforks for random phylogenetic trees

Thu, 23 Sep 2021 00:00:00 GMT

Title: On asymptotic joint distributions of cherries and pitchforks for random phylogenetic trees Authors: Choi, Kwok Pui; Kaur, Gursharn; Wu, Taoyang Abstract: Tree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolutionary relationships among taxonomic units ranging from viruses to species. We study two subtree counting statistics, the number of cherries and the number of pitchforks, for random phylogenetic trees generated by two widely used null tree models: the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. By developing limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we deduce the strong laws of large numbers and the central limit theorems for the joint distributions of these two counting statistics for the PDA and the YHK models. Our results indicate that the limiting behaviour of these two statistics, when appropriately scaled using the number of leaves in the underlying trees, is independent of the initial tree used in the tree generating process. © 2021, The Author(s).