Lim Soon Wong
Email Address
dcswls@nus.edu.sg
Organizational Units
NUS GRADUATE SCHOOL
faculty
COMPUTING
faculty
163 results
Publication Search Results
Now showing 1 - 10 of 163
Publication Next generation sequencing unravels the biosynthetic ability of Spearmint (Mentha spicata) peltate glandular trichomes through comparative transcriptomics(BioMed Central Ltd., 2014) Jin J.; Panicker D.; Wang Q.; Kim M.J.; Liu J.; Yin J.-L.; Wong L.; Jang I.-C.; Chua N.-H.; Sarojam R.; DEPARTMENT OF COMPUTER SCIENCE; BIOLOGICAL SCIENCES; BIOCHEMISTRY; NUS NANOSCIENCE & NANOTECH INITIATIVEPublication GWAMAR: Genome-wide assessment of mutations associated with drug resistance in bacteria(2014) Wozniak, M; Tiuryn, J; Wong, L; INSTITUTE OF SYSTEMS SCIENCEBackground: Development of drug resistance in bacteria causes antibiotic therapies to be less effective and more costly. Moreover, our understanding of the process remains incomplete. One promising approach to improve our understanding of how resistance is being acquired is to use whole-genome comparative approaches for detection of drug resistance-associated mutations. Results: We present GWAMAR, a tool we have developed for detecting of drug resistance-associated mutations in bacteria through comparative analysis of whole-genome sequences. The pipeline of GWAMAR comprises several steps. First, for a set of closely related bacterial genomes, it employs eCAMBer to identify homologous gene families. Second, based on multiple alignments of the gene families, it identifies mutations among the strains of interest. Third, it calculates several statistics to identify which mutations are the most associated with drug resistance. Conclusions: Based on our analysis of two large datasets retrieved from publicly available data for M. tuberculosis, we identified a set of novel putative drug resistance-associated mutations. As a part of this work, we present also an application of our tool to detect putative compensatory mutations. © 2014 Wozniak et al.Publication Guest Editorial Text Mining and Management in Biomedicine(2006) Park, J.C.; Lee, G.G.; Wong, L.; COMPUTER SCIENCEPublication From the static interactome to dynamic protein complexes: Three challenges(World Scientific Publishing Co. Pte Ltd, 2015) Yong C.H; Wong L; COMPUTER SCIENCEPublication Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions(2006) Chua, H.N.; Sung, W.-K.; Wong, L.; COMPUTER SCIENCEMotivation: Most approaches in predicting protein function from protein-protein interaction data utilize the observation that a protein often share functions with proteins that interacts with it (its level-1 neighbours). However, proteins that interact with the same proteins (i.e. level-2 neighbours) may also have a greater likelihood of sharing similar physical or biochemical characteristics. We speculate that functional similarity between a protein and its neighbours from the two different levels arise from two distinct forms of functional association, and a protein is likely to share functions with its level-1 and/or level-2 neighbours. We are interested in finding out how significant is functional association between level-2 neighbours and how they can be exploited for protein function prediction. Results: We made a statistical study on recent interaction data and observed that functional association between level-2 neighbours is clearly observable. A substantial number of proteins are observed to share functions with level-2 neighbours but not with level-1 neighbours. We develop an algorithm that predicts the functions of a protein in two steps: (1) assign a weight to each of its level-1 and level-2 neighbours by estimating its functional similarity with the protein using the local topology of the interaction network as well as the reliability of experimental sources and (2) scoring each function based on its weighted frequency in these neighbours. Using leave-one-out cross validation, we compare the performance of our method against that of several other existing approaches and show that our method performs relatively well. © 2006 Oxford University Press.Publication HPCgen - A fast generator of contact networks of large urban cities for epidemiological studies(2009) Zhang, T.; Soh, S.H.; Fu, X.; Lee, K.K.; Wong, L.; Ma, S.; Xiao, G.; Kwoh, C.K.; COMPUTER SCIENCEA contact network is the well representation of heterogeneous contact behaviors within the population. Incorporating contact networks as well as community structures is important in realistic modeling and simulation for the spread of infectious diseases. We developed the "HPCgen", a fast and generic generator of contact networks of large urban cities, with the capacity of automating network re-generations for intervention studies. The produced contact networks are applicable in both analytical modeling and agent-based simulations. In this paper, we presented the design and realization of HPCgen followed by the empirical results of building Singapore contact networks with six types of community structures in the common urban settings. The results showed our 8-node parallelized HPCgen could generated a contact network of 3.4 million populations within 62.17 seconds, which is 90% reduction of runtime. © 2009 IEEE.Publication Stringent DDI-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions(BioMed Central Ltd., 2013) Zhou H.; Rezaei J.; Hugo W.; Gao S.; Jin J.; Fan M.; Yong C.-H.; Wozniak M.; Wong L.; DEPARTMENT OF COMPUTER SCIENCEPublication FastTagger: An efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium(2010) Liu, G.; Wang, Y.; Wong, L.; COMPUTER SCIENCEBackground: Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r2 LD statistic have gained popularity because r2 is directly related to statistical power to detect disease associations. Most of existing r2 based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.Results: We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when r2 ≥ 0.9.Conclusions: Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem. © 2010 Liu et al; licensee BioMed Central Ltd.Publication Evaluating temporal factors in combined interventions of workforce shift and school closure for mitigating the spread of influenza(2012) Zhang, T.; Fu, X.; Ma, S.; Xiao, G.; Wong, L.; Kwoh, C.K.; Lees, M.; Lee, G.K.K.; Hung, T.; COMPUTER SCIENCEBackground: It is believed that combined interventions may be more effective than individual interventions in mitigating epidemic. However there is a lack of quantitative studies on performance of the combination of individual interventions under different temporal settings. Methodology/Principal Findings: To better understand the problem, we develop an individual-based simulation model running on top of contact networks based on real-life contact data in Singapore. We model and evaluate the spread of influenza epidemic with intervention strategies of workforce shift and its combination with school closure, and examine the impacts of temporal factors, namely the trigger threshold and the duration of an intervention. By comparing simulation results for intervention scenarios with different temporal factors, we find that combined interventions do not always outperform individual interventions and are more effective only when the duration is longer than 6 weeks or school closure is triggered at the 5% threshold; combined interventions may be more effective if school closure starts first when the duration is less than 4 weeks or workforce shift starts first when the duration is longer than 4 weeks. Conclusions/Significance: We therefore conclude that identifying the appropriate timing configuration is crucial for achieving optimal or near optimal performance in mitigating the spread of influenza epidemic. The results of this study are useful to policy makers in deliberating and planning individual and combined interventions. © 2012 Zhang et al.Publication The dichotomous intensional expressive power of the nested relational calculus with powerset(2013) Wong, L.; COMPUTER SCIENCEMost existing studies on the expressive power of query languages have focused on what queries can be expressed and what queries cannot be expressed in a query language. They do not tell us much about whether a query can be implemented efficiently in a query language. Yet, paradoxically, efficiency is a key concern in computer science. In this paper, the efficiency of queries in, a nested relational calculus with a powerset operation, is discussed. A dichotomy in the efficiency of these queries on a large general class of structures - which include long chains, deep trees, etc. - is proved. In particular, it is shown that these queries are either already expressible in the usual nested relational calculus or require at least exponential space. This Dichotomy Theorem, when coupled with the bounded degree and locality properties of the usual nested relational calculus becomes a powerful general tool in studying the intensional expressive power of query languages. The bounded degree and locality properties make it easy to prove that a query is inexpressible in the usual nested relational calculus. Then, if the query is expressible in, subject to the conditions of the Dichotomy Theorem, the query must take at least exponential space. © 2013 Springer-Verlag Berlin Heidelberg.