ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Tue, 18 Jun 2019 21:52:27 GMT2019-06-18T21:52:27Z50221- Matrix extension and biorthogonal multiwavelet constructionhttps://scholarbank.nus.edu.sg/handle/10635/103529Title: Matrix extension and biorthogonal multiwavelet construction
Authors: Goh, S.S.; Yap, V.B.
Abstract: Suppose that P(z) and P̃(z) are two r × n matrices over the Laurent polynomial ring ℛ[z], where r < n, which satisfy the identity P(z)P̃(z)* = Ir on the unit circle double-struct T sign. We develop an algorithm that produces two n × n matrices Q(z) and Q̃(z) over ℛ[z], satisfying the identity Q(z)Q̃(z)* = In on double-struct T sign, such that the submatrices formed by the first r rows of Q(z) and Q(z) are P(z) and P(z) respectively. Our algorithm is used to construct compactly supported biorthogonal multiwavelets from multiresolutions generated by univariate compactly supported biorthogonal scaling functions with an arbitrary dilation parameter m ∈ ℤ, where m > 1. © 1998 Elsevier Science Inc.
Thu, 15 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1035291998-01-15T00:00:00Z
- Reconstructing the invasion history of a spreading, non-native, tropical tree through a snapshot of current distribution, sizes, and growth rateshttps://scholarbank.nus.edu.sg/handle/10635/148323Title: Reconstructing the invasion history of a spreading, non-native, tropical tree through a snapshot of current distribution, sizes, and growth rates
Authors: CHONG KWEK YAN; RAPHAEL, MARK BRIAN; CARRASCO TORRECILLA,LUIS R; YEE THIAM KOON, ALEX; GIAM XINGLI; YAP VON BING; TAN TIANG WAH,HUGH
Abstract: Elucidating the invasion history of non-native species has been dependent on coarse-grain and expensive methods or long-term monitoring during which the spread may have proceeded beyond feasible control. We used the case of a relatively recent introduction and spread of the neotropical Cecropia pachystachya in Singapore to develop a method for reconstructing spatio-temporal patterns of spread through a low-cost, cross-sectional study. Size and growth rates were measured for C. pachystachya trees as well as the native Macaranga gigantea. A power-expansion exponential-decline function was a better fit than the probability density function of the log-normal distribution in describing the growth-rate to size relationship for both species. C. pachystachya trees generally grew faster (up to 5.4 ± 0.1 cm per year at 12.2 ± 0.2 cm DBH) than M. gigantea trees (up to 3.8 ± 0.2 cm per year at 11.5 ± 0.3 cm DBH). We demonstrated that the integral of the reciprocal of these growth equations provides an estimate of the age of the individuals from their size. Using the size and geographic coordinates of C. pachystachya trees from an island-wide search, we estimate that the invasion front of reproductive trees (>5 cm DBH) showed at least a 20-year lag phase from the time of initial establishment to the year 2005, before advancing exponentially at median rates between 5-466 m year-1 with maximum rates of several km year-1. The extent of occurrence expanded by nearly 10-fold from 2004-2012. Consequently, the spatial dynamics of trees can be reproduced using ontogenetic growth functions.
Thu, 01 Jun 2017 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1483232017-06-01T00:00:00Z
- Rooting a phylogenetic tree with nonreversible substitution modelshttps://scholarbank.nus.edu.sg/handle/10635/144221Title: Rooting a phylogenetic tree with nonreversible substitution models
Authors: Yap V.B.; Speed T.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1442212005-01-01T00:00:00Z
- Application of statistics and machine learning for risk stratification of heritable cardiac arrhythmiashttps://scholarbank.nus.edu.sg/handle/10635/52793Title: Application of statistics and machine learning for risk stratification of heritable cardiac arrhythmias
Authors: Wasan, P.S.; Uttamchandani, M.; Moochhala, S.; Yap, V.B.; Yap, P.H.
Abstract: In the clinical management of heritable cardiac arrhythmias (HCAs), risk stratification is of prime importance. The ability to predict the likelihood of individuals within a sub-population contracting a pathology potentially resulting in sudden death gives subjects the opportunity to put preventive measures in place, and make the necessary lifestyle adjustments to increase their chances of survival. In this paper, we review classical methods that have commonly been used in clinical studies for risk stratification in HCA, such as odds ratios, hazard ratios, Chi-squared tests, and logistic regression, discussing their benefits and shortcomings. We then explore less common and more recent statistical and machine learning methods adopted by other biological studies and assess their applicability in the study of HCA. These methods typically support the multivariate analysis of risk factors, such as decision trees, neural networks, support vector machines and Bayesian classifiers. They have been adopted for feature selection of predictor variables in risk stratification studies, and in some cases, prove better than classical methods. © 2012 Elsevier B.V. All rights reserved.
Sat, 01 Jun 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/527932013-06-01T00:00:00Z
- Distribution and abundance of horseshoe crabs Tachypleus gigas and Carcinoscorpius rotundicauda around the main island of Singaporehttps://scholarbank.nus.edu.sg/handle/10635/105096Title: Distribution and abundance of horseshoe crabs Tachypleus gigas and Carcinoscorpius rotundicauda around the main island of Singapore
Authors: Cartwright-Taylor, L.; von Bing, Y.; Chi, H.C.; Tee, L.S.
Abstract: A survey and interviews with fishermen to determine the current spatial distribution of coastal Tachypleus gigas and mangrove Carcinoscorpius rotundicauda horseshoe crabs on the main island of Singapore indicated that there are probably no sites that support a breeding population of T. gigas. The only adult T. gigas seen were trapped in nets at 1 site, no juveniles or sub-adults were found at any site, and fishermen see this species infrequently. C. rotundicauda were more abundant, and breeding populations were found on the mudflats, fringed with mangroves. These small areas may be the last sites that support a breeding population of C. rotundicauda. Population density studies of mainly surface crabs on the mudflats at 1 site gave a conservative figure of 0.5 crabs m -2 using non-randomised, longitudinal belt-transects of 5 × 50 m, set from high- to low-tide zones. Smaller randomised quadrats, searched for both buried and surface crabs, gave densities of 0.57 to 0.98 individuals m -2, equivalent to a possible abundance ranging from 29 925 to 51 450 individuals in the accessible search area of 52 500 m2. Comparisons over different months suggest that density changed little over time. Randomised quadrats and searches to depletion gave higher density figures, but they are labour intensive and difficult to set up in the terrain. Randomised, longitudinal belt- transects of 5 × 50 m are recommended for long-term monitoring of crab density. These findings provide baseline data to monitor the population at the site and to formulate conservation strategies for the 2 crab species. © Inter-Research 2011.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1050962011-01-01T00:00:00Z
- Estimates of the Effect of Natural Selection on Protein-Coding Contenthttps://scholarbank.nus.edu.sg/handle/10635/105121Title: Estimates of the Effect of Natural Selection on Protein-Coding Content
Authors: Yap, V.B.; Lindsay, H.; Easteal, S.; Huttley, G.
Abstract: Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (ω) distinguishes neutrally evolving sequences (ω = 1) from those subjected to purifying (ω < 1) or positive Darwinian (ω > 1) selection. We show that current models used to estimate ω are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with ∼10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.
Mon, 01 Mar 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1051212010-03-01T00:00:00Z
- Statistical analysis of binary data generated from multilocus dominant DNA markershttps://scholarbank.nus.edu.sg/handle/10635/105389Title: Statistical analysis of binary data generated from multilocus dominant DNA markers
Authors: Khang, T.F.; Yap, V.B.
Abstract: The use of methodologies such as RAPD and AFLP for studying genetic variation in natural populations is widespread in the ecology community. Because data generated using these methods exhibit dominance, their statistical treatment is less straightforward. Several estimators have been proposed for estimating population genetic parameters, assuming simple random sampling and the Hardy-Weinberg (HW) law. The merits of these estimators remain unclear because no comparative studies of their theoretical properties have been carried out. Furthermore, ascertainment bias has not been explicitly modelled. Here, we present a comparison of a set of candidate estimators of null allele frequency (q), locus-specific heterozygosity (h) and average heterozygosity in terms of their bias, standard error, and root mean square error (RMSE). For estimating q and h, we show that none of the estimators considered has the least RMSE over the parameter space. Our proposed zero-correction procedure, however, generally leads to estimators with improved RMSE. Assuming a beta model for the distribution of null homozygote proportions, we show how correction for ascertainment bias can be carried out using a linear transform of the sample average of h and the truncated beta-binomial likelihood. Simulation results indicate that the maximum likelihood and empirical Bayes estimator of have negligible bias and similar RMSE. Ascertainment bias in estimators of is most pronounced when the beta distribution is J-shaped and negligible when the latter is inverse J-shaped. The validity of the current findings depends importantly on the HW assumption-a point that we illustrate using data from two published studies. © 2010 Blackwell Publishing Ltd.
Mon, 01 Nov 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1053892010-11-01T00:00:00Z
- Genetic distance for a general non-stationary markov substitution processhttps://scholarbank.nus.edu.sg/handle/10635/123777Title: Genetic distance for a general non-stationary markov substitution process
Authors: Kaehler, Benjamin D; Yap, Von Bing; Zhang, Rongli; Huttley, Gavin A.
Thu, 01 Jan 2015 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1237772015-01-01T00:00:00Z
- Wilson confidence intervals for the two-sample log-odds-ratio in stratified 2 × 2 contingency tableshttps://scholarbank.nus.edu.sg/handle/10635/105466Title: Wilson confidence intervals for the two-sample log-odds-ratio in stratified 2 × 2 contingency tables
Authors: Brown, B.M.; Suesse, T.; Yap, V.B.
Abstract: Large-sample Wilson-type confidence intervals (CIs) are derived for a parameter of interest in many clinical trials situations: the log-odds-ratio, in a two-sample experiment comparing binomial success proportions, say between cases and controls. The methods cover several scenarios: (i) results embedded in a single 2 × 2 contingency table; (ii) a series of K 2 × 2 tables with common parameter; or (iii) K tables, where the parameter may change across tables under the influence of a covariate. The calculations of the Wilson CI require only simple numerical assistance, and for example are easily carried out using Excel. The main competitor, the exact CI, has two disadvantages: It requires burdensome search algorithms for the multi-table case and results in strong over-coverage associated with long confidence intervals. All the application cases are illustrated through a wellknown example. A simulation study then investigates how the Wilson CI performs among several competing methods. The Wilson interval is shortest, except for very large odds ratios, while maintaining coverage similar to Wald-type intervals. An alternative to the Wald CI is the Agresti-Coull CI, calculated from the Wilson and Wald CIs, which has same length as the Wald CI but improved coverage. Copyright © Taylor & Francis Group, LLC.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054662012-01-01T00:00:00Z
- Data-driven optimization of metabolomics methods using rat liver sampleshttps://scholarbank.nus.edu.sg/handle/10635/50686Title: Data-driven optimization of metabolomics methods using rat liver samples
Authors: Parab, G.S.; Rao, R.; Lakshminarayanan, S.; Von Bing, Y.; Moochhala, S.M.; Swarup, S.
Abstract: The aim of metabolomics is to identify, measure, and interpret complex time-related concentration, activity, and flux of metabolites in cells, tissues, and biofluids. We have used a metabolomics approach to study the biochemical phenotype of mammalian cells which will help in the development of a panel of early stage biomarkers of heat stress tolerance and adaptation. As a first step, a simple and sensitive mass spectrometry experimental workflow has been optimized for the profiling of metabolites in rat tissues. Sample (liver tissue) preparation consisted of a homogenization step in three different buffers, acidification with different strengths of acids, and solid-phase extraction using nine types of cartridges of varying specificities. These led to 18 combinations of acids, cartridges, and buffers for testing for positive and negative ions using mass spectrometry. Results were analyzed and visualized using algorithms written in MATLAB v7.4.0.287. By testing linearity, repeatability, and implementation of univariate and multivariate data analysis, a robust me-tabolomics platform has been developed. These results will form a basis for future applications in discovering metabolite markers for early diagnosis of heat stress and tissue damage. © 2009 American Chemical Society.
Sun, 15 Feb 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/506862009-02-15T00:00:00Z
- The defensive role of scutes in juvenile fluted giant clams (Tridacna squamosa)https://scholarbank.nus.edu.sg/handle/10635/53216Title: The defensive role of scutes in juvenile fluted giant clams (Tridacna squamosa)
Authors: Han, L.; Todd, P.A.; Chou, L.M.; Bing, Y.V.; Sivaloganathan, B.
Abstract: This study tests the hypothesis that the scaly projections (scutes) on the shells of juvenile giant fluted clams, Tridacna squamosa, are an adaptation against crushing predators such as crabs. The forces required to crush scutes and clams were measured with a universal testing machine whereas crab chela strength was measured with a digital force gauge connected to a set of lever arms. Results for shell properties and chela strength are used to create two, non-mutually exclusive, predator-defense models. In Model 1, scutes increase the overall shell size, consequently reducing the number of crab predators with chelae that are large enough to seize and crush the prey. In Model 2, the chela has to open more to grasp a prey with these projecting structures which leads to a loss of claw-closing force such that crabs fail to crush the scutes, and consequently the clam. Clam scutes may also deter crab predators by increasing the risk of claw damage and/or handling time. © 2008 Elsevier B.V. All rights reserved.
Mon, 28 Apr 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/532162008-04-28T00:00:00Z
- Erratum: Pitfalls of the most commonly used models of context dependent substitution (Biology Direct (2009) vol. 4(10))https://scholarbank.nus.edu.sg/handle/10635/105485Title: Erratum: Pitfalls of the most commonly used models of context dependent substitution (Biology Direct (2009) vol. 4(10))
Authors: Lindsay, H.; Bing, V.B.; Ying, H.; Huttley, G.A.
Wed, 18 Mar 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054852009-03-18T00:00:00Z
- Comparison of methods for estimating the nucleotide substitution matrixhttps://scholarbank.nus.edu.sg/handle/10635/105063Title: Comparison of methods for estimating the nucleotide substitution matrix
Authors: Oscamou, M.; McDonald, D.; Bing, V.B.; Huttley, G.A.; Lladser, M.E.; Knight, R.
Abstract: Background: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. Results: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. Conclusion: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life. © 2008 Oscamou et al; licensee BioMed Central Ltd.
Mon, 01 Dec 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1050632008-12-01T00:00:00Z
- Effects of normalization on quantitative traits in association testhttps://scholarbank.nus.edu.sg/handle/10635/110052Title: Effects of normalization on quantitative traits in association test
Authors: Goh, L.; Yap, V.B.
Abstract: Background: Quantitative trait loci analysis assumes that the trait is normally distributed. In reality, this is often not observed and one strategy is to transform the trait. However, it is not clear how much normality is required and which transformation works best in association studies. Results: We performed simulations on four types of common quantitative traits to evaluate the effects of normalization using the logarithm, Box-Cox, and rank-based transformations. The impact of sample size and genetic effects on normalization is also investigated. Our results show that rank-based transformation gives generally the best and consistent performance in identifying the causal polymorphism and ranking it highly in association tests, with a slight increase in false positive rate. Conclusion: For small sample size or genetic effects, the improvement in sensitivity for rank transformation outweighs the slight increase in false positive rate. However, for large sample size and genetic effects, normalization may not be necessary since the increase in sensitivity is relatively modest. © 2009 Goh and Yap; licensee BioMed Central Ltd.
Mon, 14 Dec 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1100522009-12-14T00:00:00Z
- The apportionment of total genetic variation by categorical analysis of variancehttps://scholarbank.nus.edu.sg/handle/10635/105406Title: The apportionment of total genetic variation by categorical analysis of variance
Authors: Khang, T.F.; Yap, V.B.
Abstract: We wish to suggest the categorical analysis of variance as a means of quantifying the proportion of total genetic variation attributed to different sources of variation. This method potentially challenges researchers to rethink conclusions derived from a well-known method known as the analysis of molecular variance (AMOVA). The CATANOVA framework allows explicit definition, and estimation, of two measures of genetic differentiation. These parameters form the subject of interest in many research programmes, but are often confused with the correlation measures defined in AMOVA, which cannot be interpreted as relative contributions of particular sources of variation. Through a simulation approach, we show that under certain conditions, researchers who use AMOVA to estimate these measures of genetic differentiation may attribute more than justified amounts of total variation to population labels. Moreover, the two measures can also lead to incongruent conclusions regarding the genetic structure of the populations of interest. Fortunately, one of the two measures seems robust to variations in relative sample sizes used. Its merits are illustrated in this paper using mitochondrial haplotype and amplified fragment length polymorphism (AFLP) data. © 2010 The Berkeley Electronic Press. All rights reserved.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054062010-01-01T00:00:00Z
- Pathological rate matrices: From primates to pathogenshttps://scholarbank.nus.edu.sg/handle/10635/105295Title: Pathological rate matrices: From primates to pathogens
Authors: Schranz, H.W.; Yap, B.V.; Easteal, S.; Knight, R.; Huttley, G.A.
Abstract: Background: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation. Results: We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ∼100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ∼10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ∼30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Padé algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3× faster than eigendecomposition on the same matrices. Conclusion: Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm. © 2008 Schranz et al; licensee BioMed Central Ltd.
Fri, 19 Dec 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1052952008-12-19T00:00:00Z
- Pitfalls of the most commonly used models of context dependent substitutionhttps://scholarbank.nus.edu.sg/handle/10635/105301Title: Pitfalls of the most commonly used models of context dependent substitution
Authors: Lindsay, H.; Yap, V.B.; Ying, H.; Huttley, G.A.
Abstract: Background: Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates. Results: We prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms. Conclusion: Our results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for ∼85% of neighboring nucleotide influence. © 2008 Lindsay et al; licensee BioMed Central Ltd.
Tue, 16 Dec 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1053012008-12-16T00:00:00Z
- A unified approach to the transition matrices of DNA substitution modelshttps://scholarbank.nus.edu.sg/handle/10635/104983Title: A unified approach to the transition matrices of DNA substitution models
Authors: Yap, V.B.
Abstract: For a reversible finite-state continuous-time Markov chain containing similar states, the computation of the transition matrix can be expressed quite elegantly in terms of the transition matrix of an associated lumped Markov chain. This result is immensely useful for obtaining explicit transition matrices for many DNA substitution models, without diagonalizing a matrix or solving a differential equation. Furthermore, the technique works for the analogous problem in the discrete-time DNA substitution models. © 2013 Elsevier Inc.
Mon, 01 Apr 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1049832013-04-01T00:00:00Z
- The Embedding Problem for Markov Models of Nucleotide Substitutionhttps://scholarbank.nus.edu.sg/handle/10635/105415Title: The Embedding Problem for Markov Models of Nucleotide Substitution
Authors: Verbyla, K.L.; Yap, V.B.; Pahwa, A.; Shao, Y.; Huttley, G.A.
Abstract: Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor. © 2013 Verbyla et al.
Tue, 30 Jul 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1054152013-07-30T00:00:00Z
- Context-dependent substitution models for circular DNAhttps://scholarbank.nus.edu.sg/handle/10635/105071Title: Context-dependent substitution models for circular DNA
Authors: Zhang, R.; Yap, V.B.
Abstract: The most general context-dependent Markov substitution process, where each substitution event involves only one site and substitution rates depend on the whole sequence, is presented for the first time. The focus is on circular DNA sequences, where the problem of specifying the behaviour of the first and last sites in a linear sequence does not arise. Important special cases include (1) the established models where each site behaves independently, (2) models which are increasingly applied to non-coding DNA, where each site depends on only the immediate neighbouring sites, and (3) models where each site depends on two closest neighbours on both sides, such as the codon models. These special cases are classified and illustrated by published models. It is shown that the existing codon substitution models mix up the mutation and selection processes, rendering the substitution rates challenging to interpret. The classification suggests the study of a more interpretable codon model, where the mutation and selection processes are clearly delineated. Furthermore, this model allows a natural accommodation of possibly different selection pressures in overlapping reading frames, which may contribute to furthering the understanding of viral diseases. Also included are brief discussions on the stationary distribution of a context-dependent substitution process and a simple recipe for simulating it on a computer. © 2013 Elsevier B.V.
Thu, 01 Aug 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1050712013-08-01T00:00:00Z
- Similar states in continuous-time Markov chainshttps://scholarbank.nus.edu.sg/handle/10635/105364Title: Similar states in continuous-time Markov chains
Authors: Yap, V.B.
Abstract: In a homogeneous continuous-time Markov chain on a finite state space, two states that jump to every other state with the same rate are called similar. By partitioning states into similarity classes, the algebraic derivation of the transition matrix can be simplified, using hidden holding times and lumped Markov chains. When the rate matrix is reversible, the transition matrix is explicitly related in an intuitive way to that of the lumped chain. The theory provides a unified derivation for a whole range of useful DNA base substitution models, and a number of amino acid substitution models. © Applied Probability Trust 2009.
Mon, 01 Jun 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1053642009-06-01T00:00:00Z
- The extent of undiscovered species in Southeast Asiahttps://scholarbank.nus.edu.sg/handle/10635/101907Title: The extent of undiscovered species in Southeast Asia
Authors: Giam, X.; Ng, T.H.; Yap, V.B.; Tan, H.T.W.
Abstract: Southeast Asia has the highest rate of deforestation among all tropical regions in the world. Depending on the number of undiscovered species not yet known to science, a sizeable proportion of species may have gone extinct or will go extinct in the future without record. We compiled species datasets for eight taxa, each consisting of a list of native species and their description dates. Birds, legumes, mosquitoes, and mosses showed recent declines in species discovery rate. For these taxa, we estimated the total species richness by applying generalized linear models derived from theory. The number of undiscovered species in each taxon was calculated and the extent of undiscovered species among the taxa compared. Among these taxa that displayed a species discovery decline, the legumes had the highest extent of undiscovered species while the birds had the most complete species inventory. Although quantitative estimates of the number of undiscovered species for amphibians, freshwater fish, hawkmoths, and mammals could not be derived, the extent of undiscovered species is likely to be high as their recent discovery rates showed a continued increase. If these taxa are more or less representative of other Southeast Asian taxa, many species are likely to go extinct before ever being discovered by science under the current rates of habitat loss. We therefore urge the intensification of taxonomic and species discovery research in the taxa in which the extent of undiscovered species is relatively high, i. e., amphibians, freshwater fish, hawkmoths, mammals, and legumes. © Springer Science+Business Media B.V. 2010.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1019072010-01-01T00:00:00Z