ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Sun, 17 Nov 2019 15:30:19 GMT2019-11-17T15:30:19Z501851- AP-2γ 3 regulates oestrogen receptor-mediated long-range chromatin interaction and gene transcriptionhttps://scholarbank.nus.edu.sg/handle/10635/39849Title: AP-2γ 3 regulates oestrogen receptor-mediated long-range chromatin interaction and gene transcription
Authors: Tan, S.K.; Lin, Z.H.; Chang, C.W.; Varang, V.; Chng, K.R.; Pan, Y.F.; Yong, E.L.; Sung, W.K.; Cheung, E.
Abstract: Oestrogen receptor α (ERα ) is key player in the progression of breast cancer. Recently, the cistrome and interactome of ERα were mapped in breast cancer cells, revealing the importance of spatial organization in oestrogen-mediated transcription. However, the underlying mechanism of this process is unclear. Here, we show that ERα binding sites (ERBS) identified from the Chromatin Interaction Analysis-Paired End DiTag of ERα are enriched for AP-2 motifs. We demonstrate the transcription factor, AP-2α 3, which has been implicated in breast cancer oncogenesis, binds to ERBS in a ligand-independent manner. Furthermore, perturbation of AP-2α 3 expression impaired ERα DNA binding, long-range chromatin interactions, and gene transcription. In genome-wide analyses, we show that a large number of AP-2α 3 and ERα binding events converge together across the genome. The majority of these shared regions are also occupied by the pioneer factor, FoxA1. Molecular studies indicate there is functional interplay between AP-2α 3 and FoxA1. Finally, we show that most ERBS associated with long-range chromatin interactions are colocalized with AP-2α 3 and FoxA1. Together, our results suggest AP-2α 3 is a novel collaborative factor in ERα -mediated transcription. © 2011 European Molecular Biology Organization | All Rights Reserved.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/398492011-01-01T00:00:00Z
- Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs)https://scholarbank.nus.edu.sg/handle/10635/38985Title: Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs)
Authors: Ruan, Y.; Hong, S.O.; Siew, W.C.; Kuo, P.C.; Xiao, D.Z.; Srinivasan, K.G.; Yao, F.; Chiou, Y.C.; Liu, J.; Ariyaratne, P.; Bin, W.G.W.; Kuznetsov, V.A.; Shahab, A.; Sung, W.-K.; Bourque, G.; Palanisamy, N.; Wei, C.-L.
Abstract: Identification of unconventional functional features such as fusion transcripts is a challenging task in the effort to annotate all functional DNA elements in the human genome. Paired-End diTag (PET) analysis possesses a unique capability to accurately and efficiently characterize the two ends of DNA fragments, which may have either normal or unusual compositions. This unique nature of PET analysis makes it an ideal tool for uncovering unconventional features residing in the human genome. Using the PET approach for comprehensive transcriptome analysis, we were able to identify fusion transcripts derived from genome rearrangements and actively expressed retrotransposed pseudogenes, which would be difficult to capture by other means. Here, we demonstrate this unique capability through the analysis of 865,000 individual transcripts in two types of cancer cells. In addition to the characterization of a large number of differentially expressed alternative 5′ and 3′ transcript variants and novel transcriptional units, we identified 70 fusion transcript candidates in this study. One was validated as the product of a fusion gene between BCAS4 and BCAS3 resulting from an amplification followed by a translocation event between the two loci, chr20q13 and chr17q23. Through an examination of PETs that mapped to multiple genomic locations, we identified 4055 retrotransposed loci in the human genome, of which at least three were found to be transcriptionally active. The PET mapping strategy presented here promises to be a useful tool in annotating the human genome, especially aberrations in human cancer genomes. ©2007 by Cold Spring Harbor Laboratory Press.
Mon, 01 Jan 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/389852007-01-01T00:00:00Z
- SLiM on diet: Finding short linear motifs on domain interaction interfaces in protein data bankhttps://scholarbank.nus.edu.sg/handle/10635/39689Title: SLiM on diet: Finding short linear motifs on domain interaction interfaces in protein data bank
Authors: Hugo, W.; Song, F.; Aung, Z.; Ng, S.-K.; Sung, W.-K.
Abstract: Motivation: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures. Results: Using domain interface extraction (Diet), we characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated in varying degrees-40 have literature validation, 54 are supported by at least one domain-peptide structural instance, and another 61 have overrepresentation in high-throughput PPI data. We further observed that the lacklustre coverage of existing computational SLiM detection methods could be due to the common assumption that most SLiMs occur outside globular domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We suggest that these SLiMs would be useful for designing inhibitors against the pathogenic protein complexes underlying these diseases. Our findings show that 3D structure-based SLiM detection algorithms can provide a more complete coverage of SLiM-mediated protein interactions than current sequence-based approaches. Contact: ksung@comp.nus.edu.sg. Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/396892010-01-01T00:00:00Z
- Evaluation of methods for modeling transcription factor sequence specificityhttps://scholarbank.nus.edu.sg/handle/10635/77853Title: Evaluation of methods for modeling transcription factor sequence specificity
Authors: Weirauch, M.T.; Cote, A.; Norel, R.; Annala, M.; Zhao, Y.; Riley, T.R.; Saez-Rodriguez, J.; Cokelaer, T.; Vedenko, A.; Talukder, S.; Bussemaker, H.J.; Quaid, M.D.; Bulyk, M.L.; Stolovitzky, G.; Hughes, T.R.; Agius, P.; Arvey, A.; Bucher, P.; Callan Jr., C.G.; Chang, C.W.; Chen, C.-Y.; Chen, Y.-S.; Chu, Y.-W.; Grau, J.; Grosse, I.; Jagannathan, V.; Keilwagen, J.; Kiebasa, S.M.; Kinney, J.B.; Klein, H.; Kursa, M.B.; Lähdesmäki, H.; Laurila, K.; Lei, C.; Leslie, C.; Linhart, C.; Murugan, A.; Myšičková, A.; Noble, W.S.; Nykter, M.; Orenstein, Y.; Posch, S.; Ruan, J.; Rudnicki, W.R.; Schmid, C.D.; Shamir, R.; Sung, W.-K.; Vingron, M.; Zhang, Z.
Abstract: Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences. © 2013 Nature America, Inc. All rights reserved.
Fri, 01 Feb 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/778532013-02-01T00:00:00Z
- Do triplets have enough information to construct the multi-labeled phylogenetic tree?https://scholarbank.nus.edu.sg/handle/10635/142466Title: Do triplets have enough information to construct the multi-labeled phylogenetic tree?
Authors: Hassanzadeh R.; Eslahchi C.; Sung W.-K.
Wed, 01 Jan 2014 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1424662014-01-01T00:00:00Z
- Telomerase directly regulates NF-B-dependent transcriptionhttps://scholarbank.nus.edu.sg/handle/10635/39662Title: Telomerase directly regulates NF-B-dependent transcription
Authors: Ghosh, A.; Saginc, G.; Leow, S.C.; Khattar, E.; Shin, E.M.; Yan, T.D.; Wong, M.; Zhang, Z.; Li, G.; Sung, W.-K.; Zhou, J.; Chng, W.J.; Li, S.; Liu, E.; Tergaonkar, V.
Abstract: Although elongation of telomeres is thought to be the prime function of reactivated telomerase in cancers, this activity alone does not account for all of the properties that telomerase reactivation attributes to human cancer cells. Here, we uncover a link between telomerase and NF-B, a master regulator of inflammation. We observe that while blocking NF-B signalling can inhibit effects of telomerase overexpression on processes relevant to transformation, increasing NF-B activity can functionally substitute for reduced telomerase activity. Telomerase directly regulates NF-B-dependent gene expression by binding to the NF-B p65 subunit and recruitment to a subset of NF-B promoters such as those of IL-6 and TNF-α, cytokines that are critical for inflammation and cancer progression. As NF-B can transcriptionally upregulate telomerase levels, our findings suggest that a feed-forward regulation between them could be the key mechanistic basis for the coexistence of chronic inflammation and sustained telomerase activity in human cancers. © 2012 Macmillan Publishers Limited. All rights reserved.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/396622012-01-01T00:00:00Z
- Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinomahttps://scholarbank.nus.edu.sg/handle/10635/77946Title: Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma
Authors: Kan, Z.; Zheng, H.; Liu, X.; Li, S.; Barber, T.D.; Gong, Z.; Gao, H.; Hao, K.; Willard, M.D.; Xu, J.; Hauptschein, R.; Rejto, P.A.; Fernandez, J.; Wang, G.; Zhang, Q.; Wang, B.; Chen, R.; Wang, J.; Lee, N.P.; Zhou, W.; Lin, Z.; Peng, Z.; Yi, K.; Chen, S.; Li, L.; Fan, X.; Yang, J.; Ye, R.; Ju, J.; Wang, K.; Estrella, H.; Deng, S.; Wei, P.; Qiu, M.; Wulur, I.H.; Liu, J.; Ehsani, M.E.; Zhang, C.; Loboda, A.; Sung, W.K.; Aggarwal, A.; Poon, R.T.; Fan, S.T.; Wang, J.; Hardwick, J.; Reinhard, C.; Dai, H.; Li, Y.; Luk, J.M.; Mao, M.
Abstract: Hepatocellular carcinoma (HCC) is one of the most deadly cancers worldwide and has no effective treatment, yet the molecular basis of hepatocarcinogenesis remains largely unknown. Here we report findings from a whole-genome sequencing (WGS) study of 88 matched HCC tumor/normal pairs, 81 of which are Hepatitis B virus (HBV) positive, seeking to identify genetically altered genes and pathways implicated in HBV-associated HCC. We find beta-catenin to be the most frequently mutated oncogene (15.9%) and TP53 the most frequently mutated tumor suppressor (35.2%). TheWnt/beta-catenin and JAK/STAT pathways, altered in 62.5% and 45.5% of cases, respectively, are likely to act as two major oncogenic drivers in HCC. This study also identifies several prevalent and potentially actionable mutations, including activating mutations of Janus kinase 1 ( JAK1), in 9.1% of patients and provides a path toward therapeutic intervention of the disease. © 2013, Published by Cold Spring Harbor Laboratory Press.
Sun, 01 Sep 2013 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/779462013-09-01T00:00:00Z
- A mutation-sensitive approach for locating conserved gene pairs between related specieshttps://scholarbank.nus.edu.sg/handle/10635/40035Title: A mutation-sensitive approach for locating conserved gene pairs between related species
Authors: Chan, H.L.; Lam, T.W.; Sung, W.K.; Wong, P.W.H.; Yiu, S.M.
Abstract: This paper proposes a new approach for solving the whole genome alignment problem. Our approach is based on a new structural optimization problem (called the MUM Selection Problem) related to mutations via reversals and transpositions. We have devised a practical algorithm for this optimization problem and have evaluated the algorithm using 15 pairs of human and mouse chromosomes. The results show that our algorithm is both effective and efficient. More specifically, our algorithm can reveal 91% of the conserved gene pairs that have been reported in the literature. When compared to existing software MUMmer [7, 16] and MaxMinCluster[12], our algorithm uncovers 15% and 7% more genes on average, respectively. The sensitivity of our algorithm is also slightly higher. The paper concludes with a remark on the computational hardness of the MUM Selection Problem.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/400352004-01-01T00:00:00Z
- Improved approximate string matching using compressed suffix data structureshttps://scholarbank.nus.edu.sg/handle/10635/39928Title: Improved approximate string matching using compressed suffix data structures
Authors: Lam, T.-W.; Sung, W.-K.; Wong, S.-S.
Abstract: Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log 2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k m k (k + log log n) + occ) and O(log ε n(|A| k m k (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC.
Tue, 01 Jan 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/399282008-01-01T00:00:00Z
- Deletion of the WD40 domain of LRRK2 in zebrafish causes parkinsonism-like loss of neurons and locomotive defecthttps://scholarbank.nus.edu.sg/handle/10635/25140Title: Deletion of the WD40 domain of LRRK2 in zebrafish causes parkinsonism-like loss of neurons and locomotive defect
Authors: Sheng, D.; Qu, D.; Kwok, K.H.H.; Ng, S.S.; Aw, S.S.; Liu, J.; Lim, A.Y.M.; Lufkin, T.; Sinnakaruppan, M.; Lee, C.W.H.; Sung, W.K.; Tan, E.K.; Jesuthasan, S.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/251402010-01-01T00:00:00Z
- Local gapped subforest alignment and its application in finding RNA structural motifshttps://scholarbank.nus.edu.sg/handle/10635/40391Title: Local gapped subforest alignment and its application in finding RNA structural motifs
Authors: Jansson, J.; Hieu, N.T.; Sung, W.-K.
Abstract: RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests F1 and F2. This problem asks for a substructure of F1 and a substructure of F2 that exhibit a high similarity. Since an RNA molecule's secondary structure can be represented as a labeled ordered forest, the problem we study has a direct application to finding potential motifs. We generalize the previously studied concept of a closed subforest to a gapped subforest and present the first algorithm for computing the optimal local gapped subforest alignment of F1 and F2. We also show that our technique can improve the time and space complexity of the previously most efficient algorithm for optimal local closed subforest alignment. Furthermore, we prove that a special case of our local gapped subforest alignment problem is equivalent to a problem known in the literature as the local sequence-structure alignment problem (lssa) and modify our main algorithm to obtain a much faster algorithm for lssa than the one previously proposed. An implementation of our algorithm is available at www.comp.nus.edu.sg/~bioinfo/LGSFAligner/. Its running time is significantly faster than the original lssa program. © Mary Ann Liebert, Inc.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/403912006-01-01T00:00:00Z
- Association mapping via regularized regression analysis of single-nucleotide-polymorphism haplotypes in variable-sized sliding windowshttps://scholarbank.nus.edu.sg/handle/10635/39131Title: Association mapping via regularized regression analysis of single-nucleotide-polymorphism haplotypes in variable-sized sliding windows
Authors: Li, Y.; Sung, W.-K.; Liu, J.J.
Abstract: Large-scale haplotype association analysis, especially at the whole-genome level, is still a very challenging task without an optimal solution. In this study, we propose a new approach for haplotype association analysis that is based on a variable-sized sliding-window framework and employs regularized regression analysis to tackle the problem of multiple degrees of freedom in the haplotype test. Our method can handle a large number of haplotypes in association analyses more efficiently and effectively than do currently available approaches. We implement a procedure in which the maximum size of a sliding window is determined by local haplotype diversity and sample size, an attractive feature for large-scale haplotype analyses, such as a whole-genome scan, in which linkage disequilibrium patterns are expected to vary widely. We compare the performance of our method with that of three other methods-a test based on a single-nucleotide polymorphism, a cladistic analysis of haplotypes, and variable-length Markov chains-with use of both simulated and experimental data. By analyzing data sets simulated under different disease models, we demonstrate that our method consistently outperforms the other three methods, especially when the region under study has high haplotype diversity. Built on the regression analysis framework, our method can incorporate other risk-factor information into haplotype-based association analysis, which is becoming an increasingly necessary step for studying common disorders to which both genetic and environmental risk factors contribute. © 2007 by The American Society of Human Genetics. All rights reserved.
Mon, 01 Jan 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391312007-01-01T00:00:00Z
- Polynomial-time algorithms for building a consensus MUL-treehttps://scholarbank.nus.edu.sg/handle/10635/43108Title: Polynomial-time algorithms for building a consensus MUL-tree
Authors: Cui, Y.; Jansson, J.; Sung, W.-K.
Abstract: A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. © 2012, Mary Ann Liebert, Inc.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431082012-01-01T00:00:00Z
- Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control librarieshttps://scholarbank.nus.edu.sg/handle/10635/141722Title: Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries
Authors: Vega V.B.; Cheung E.; Palanisamy N.; Sung W.-K.
Thu, 01 Jan 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1417222009-01-01T00:00:00Z
- D-SLIMMER: Domain-SLiM interaction motifs miner for sequence based protein-protein interaction datahttps://scholarbank.nus.edu.sg/handle/10635/39853Title: D-SLIMMER: Domain-SLiM interaction motifs miner for sequence based protein-protein interaction data
Authors: Hugo, W.; Ng, S.-K.; Sung, W.-K.
Abstract: Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant. © 2011 American Chemical Society.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/398532011-01-01T00:00:00Z
- Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulationhttps://scholarbank.nus.edu.sg/handle/10635/39208Title: Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation
Authors: Li, G.; Ruan, X.; Auerbach, R.K.; Sandhu, K.S.; Zheng, M.; Wang, P.; Poh, H.M.; Goh, Y.; Lim, J.; Zhang, J.; Sim, H.S.; Peh, S.Q.; Mulawadi, F.H.; Ong, C.T.; Orlov, Y.L.; Hong, S.; Zhang, Z.; Landt, S.; Raha, D.; Euskirchen, G.; Wei, C.-L.; Ge, W.; Wang, H.; Davis, C.; Fisher-Aylor, K.I.; Mortazavi, A.; Gerstein, M.; Gingeras, T.; Wold, B.; Sun, Y.; Fullwood, M.J.; Cheung, E.; Liu, E.; Sung, W.-K.; Snyder, M.; Ruan, Y.
Abstract: Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells. © 2012 Elsevier Inc.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392082012-01-01T00:00:00Z
- Computational modeling of oligonucleotide positional densities for human promoter predictionhttps://scholarbank.nus.edu.sg/handle/10635/39262Title: Computational modeling of oligonucleotide positional densities for human promoter prediction
Authors: Narang, V.; Sung, W.-K.; Mittal, A.
Abstract: Objective: The gene promoter region controls transcriptional initiation of a gene, which is the most important step in gene regulation. In-silico detection of promoter region in genomic sequences has a number of applications in gene discovery and understanding gene expression regulation. However, computational prediction of eukaryotic poly-II promoters has remained a difficult task. This paper introduces a novel statistical technique for detecting promoter regions in long genomic sequences. Method: A number of existing techniques analyze the occurrence frequencies of oligonucleotides in promoter sequences as compared to other genomic regions. In contrast, the present work studies the positional densities of oligonucleotides in promoter sequences. The analysis does not require any non-promoter sequence dataset or any model of the background oligonucleotide content of the genome. The statistical model learnt from a dataset of promoter sequences automatically recognizes a number of transcription factor binding sites simultaneously with their occurrence positions relative to the transcription start site. Based on this model, a continuous naïve Bayes classifier is developed for the detection of human promoters and transcription start sites in genomic sequences. Results: The present study extends the scope of statistical models in general promoter modeling and prediction. Promoter sequence features learnt by the model correlate well with known biological facts. Results of human transcription start site prediction compare favorably with existing 2nd generation promoter prediction tools. © 2005 Elsevier B.V. All rights reserved.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392622005-01-01T00:00:00Z
- More efficient periodic traversal in anonymous undirected graphshttps://scholarbank.nus.edu.sg/handle/10635/41175Title: More efficient periodic traversal in anonymous undirected graphs
Authors: Czyzowicz, J.; Dobrev, S.; Gasieniec, L.; Ilcinkas, D.; Jansson, J.; Klasing, R.; Lignos, I.; Martin, R.; Sadakane, K.; Sung, W.-K.
Abstract: We consider the problem of periodic graph exploration in which a mobile entity with constant memory, an agent, has to visit all n nodes of an input simple, connected, undirected graph in a periodic manner. Graphs are assumed to be anonymous, that is, nodes are unlabeled. While visiting a node, the agent may distinguish between the edges incident to it; for each node v, the endpoints of the edges incident to v are uniquely identified by different integer labels called port numbers. We are interested in algorithms for assigning the port numbers together with traversal algorithms for agents using these port numbers to obtain short traversal periods. Periodic graph exploration is unsolvable if the port numbers are set arbitrarily; see Budach (1978) [1]. However, surprisingly small periods can be achieved by carefully assigning the port numbers. Dobrev et al. (2005) [4] described an algorithm for assigning port numbers and an oblivious agent (i.e., an agent with no memory) using it, such that the agent explores any graph with n nodes within the period 10n. When the agent has access to a constant number of memory bits, the optimal length of the period was proved in Gasieniec et al. (2008) [7] to be no more than 3.75n-2 (using a different assignment of the port numbers and a different traversal algorithm). In this paper, we improve both these bounds. More precisely, we show how to achieve a period length of at most (4+13)n-4 for oblivious agents and a period length of at most 3.5n-2 for agents with constant memory. To obtain our results, we introduce a new, fast graph decomposition technique called a three-layer partition that may also be useful for solving other graph problems in the future. Finally, we present the first non-trivial lower bound, 2.8n-2, on the period length for the oblivious case. © 2012 Elsevier B.V. All rights reserved.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/411752012-01-01T00:00:00Z
- Predicting RNA Secondary Structures with Arbitrary Pseudoknots by Maximizing the Number of Stacking Pairshttps://scholarbank.nus.edu.sg/handle/10635/39233Title: Predicting RNA Secondary Structures with Arbitrary Pseudoknots by Maximizing the Number of Stacking Pairs
Authors: Ieong, S.; Kao, M.-Y.; Lam, T.-W.; Sung, W.-K.; Yiu, S.-M.
Abstract: The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can handle only limited types of pseudoknots. In this paper, we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. For an RNA sequence of n bases, the approximation algorithm for planar secondary structures runs in O(n3) time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions.
Wed, 01 Jan 2003 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392332003-01-01T00:00:00Z
- Approximate string matching using compressed suffix arrayshttps://scholarbank.nus.edu.sg/handle/10635/39178Title: Approximate string matching using compressed suffix arrays
Authors: Huynh, T.N.D.; Hon, W.-K.; Lam, T.-W.; Sung, W.-K.
Abstract: Let T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabet A. The k-difference (k-mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P. In this paper we investigate a well-studied case in which T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an O(nlogn) bits indexing data structure with O(|A|kmk·max(k,logn) +occ) query time, where occ is the number of occurrences. The best previous result requires O(nlogn) bits indexing data structure and gives O(|A|kmk+2+occ) query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to O(n) bits, while increasing the query time by an O(logn) factor only. © 2005 Elsevier B.V. All right reserved.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391782006-01-01T00:00:00Z
- Algorithms for combining rooted triplets into a galled phylogenetic networkhttps://scholarbank.nus.edu.sg/handle/10635/39305Title: Algorithms for combining rooted triplets into a galled phylogenetic network
Authors: Jansson, J.; Nguyen, N.B.; Sung, W.-K.
Abstract: This paper considers the problem of determining whether a given set Τ of rooted triplets can be merged without conflicts into a galled phylogenetic network and, if so, constructing such a network. When the input Τ is dense, we solve the problem in O(|Τ|) time, which is optimal since the size of the input is Θ(|Τ|). In comparison, the previously fastest algorithm for this problem runs in O(|Τ|2) time. We also develop an optimal O(|Τ|)-time algorithm for enumerating all simple phylogenetic networks leaf-labeled by L that are consistent with Τ, where L is the set of leaf labels in Τ, which is used by our main algorithm. Next, we prove that the problem becomes NP-hard if extended to nondense inputs, even for the special case of simple phylogenetic networks. We also show that for every positive integer n, there exists some set Τ of rooted triplets on n leaves such that any galled network can be consistent with at most 0.4883 ·|Τ| of the rooted triplets in Τ. On the other hand, we provide a polynomial-time approximation algorithm that always outputs a galled network consistent with at least a factor of 5/12 (> 0.4166) of the rooted triplets in Τ. © 2006 Society for Industrial and Applied Mathematics.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/393052006-01-01T00:00:00Z
- LOMA: A fast method to generate efficient tagged-random primers despite amplification bias of random PCR on pathogenshttps://scholarbank.nus.edu.sg/handle/10635/39740Title: LOMA: A fast method to generate efficient tagged-random primers despite amplification bias of random PCR on pathogens
Authors: Lee, W.H.; Wong, C.W.; Leong, W.Y.; Miller, L.D.; Sung, W.K.
Abstract: Background: Pathogen detection using DNA microarrays has the potential to become a fast and comprehensive diagnostics tool. However, since pathogen detection chips currently utilize random primers rather than specific primers for the RT-PCR step, bias inherent in random PCR amplification becomes a serious problem that causes large inaccuracies in hybridization signals. Results: In this paper, we study how the efficiency of random PCR amplification affects hybridization signals. We describe a model that predicts the amplification efficiency of a given random primer on a target viral genome. The prediction allows us to filter false-negative probes of the genome that lie in regions of poor random PCR amplification and improves the accuracy of pathogen detection. Subsequently, we propose LOMA, an algorithm to generate random primers that have good amplification efficiency. Wet-lab validation showed that the generated random primers improve the amplification efficiency significantly. Conclusion: The blind use of a random primer with attached universal tag (random-tagged primer) in a PCR reaction on a pathogen sample may not lead to a successful amplification. Thus, the design of random-tagged primers is an important consideration when performing PCR. © 2008 Lee et al; licensee BioMed Central Ltd.
Tue, 01 Jan 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/397402008-01-01T00:00:00Z
- G-PRIMER: Greedy algorithm for selecting minimal primer sethttps://scholarbank.nus.edu.sg/handle/10635/39838Title: G-PRIMER: Greedy algorithm for selecting minimal primer set
Authors: Wang, J.; Li, K.-B.; Sung, W.-K.
Abstract: Summary: G-PRIMER, a web-based primer design program, has been developed to compute a minimal primer set specifically annealed to all the open reading frames in a given microbial genome. This program has been successfully used in the microarray experiment for analyzing the expression of genes in the Xanthomonas campestris genome. © Oxford University Press 2004; all rights reserved.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/398382004-01-01T00:00:00Z
- Compressed indexes for approximate string matchinghttps://scholarbank.nus.edu.sg/handle/10635/38987Title: Compressed indexes for approximate string matching
Authors: Chan, H.-L.; Lam, T.-W.; Sung, W.-K.; Tam, S.-L.; Wong, S.-S.
Abstract: We revisit the problem of indexing a string S[1..n] to support finding all substrings in S that match a given pattern P[1..m] with at most k errors. Previous solutions either require an index of size exponential in k or need Ω(m k ) time for searching. Motivated by the indexing of DNA, we investigate space efficient indexes that occupy only O(n) space. For k=1, we give an index to support matching in O(m+occ+log∈nlog∈log∈n) time. The previously best solution achieving this time complexity requires an index of O(nlog∈n) space. This new index can also be used to improve existing indexes for k≥2 errors. Among others, it can support 2-error matching in O(mlog∈nlog∈log∈n+occ) time, and k-error matching, for any k>2, in O(m k-1log∈nlog∈log∈n+occ) time. © 2008 Springer Science+Business Media, LLC.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/389872010-01-01T00:00:00Z
- Learning gene network using time-delayed Bayesian networkhttps://scholarbank.nus.edu.sg/handle/10635/43061Title: Learning gene network using time-delayed Bayesian network
Authors: Liu, T.-F.; Sung, W.-K.; Mittal, A.
Abstract: Exact determination of a gene network is required to discover the higher-order structures of an organism and to interpret its behavior. Most research work in learning gene networks either assumes that there is no time delay in gene expression or that there is a constant time delay. This paper shows how Bayesian Networks can be applied to represent multi-time delay relationships as well as directed loops. The intractability of the network learning algorithm is handled by using an improved mutual information criterion. Also, a new structure learning algorithm, "Learning By Modification", is proposed to learn the sparse structure of a gene network. The experimental results on synthetic data and real data show that our method is more accurate in determining the gene structure as compared to the traditional methods. Even transcriptional loops spanning over the whole cell can be detected by our algorithm. © World Scientific Publishing Company.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/430612006-01-01T00:00:00Z
- Multiplatform genome-wide identification and modeling of functional human estrogen receptor binding siteshttps://scholarbank.nus.edu.sg/handle/10635/143616Title: Multiplatform genome-wide identification and modeling of functional human estrogen receptor binding sites
Authors: Vega V.B.; Lin C.-Y.; Lai K.S.; Kong S.L.; Xie M.; Su X.; Teh H.F.; Thomsen J.S.; Yeo A.L.; Sung W.K.; Bourque G.; Liu E.T.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1436162006-01-01T00:00:00Z
- Ultra-succinct representation of ordered trees with applicationshttps://scholarbank.nus.edu.sg/handle/10635/39449Title: Ultra-succinct representation of ordered trees with applications
Authors: Jansson, J.; Sadakane, K.; Sung, W.-K.
Abstract: There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) (Munro and Raman, 2001) [20] and DFUDS (depth first unary degree sequence) (Benoit et al., 2005) [1]. Both have size 2n+o(n) bits for n-node trees, which asymptotically matches the information-theoretic lower bound. Importantly, many fundamental operations on trees can be done in constant time on the word RAM when using BP or DFUDS, for example finding the parent, the first child, the next sibling, the number of descendants, etc. Although the space needed to store the BP or DFUDS representation of an ordered tree matches the lower bound, this is not optimal when we consider encodings for certain special classes of trees such as trees in which every internal node has exactly two children. In this paper, we introduce a new, conditional entropy for trees called the tree degree entropy, and give a succinct tree representation with matching size. We call such a representation an ultra-succinct data structure. We show how to modify the DFUDS representation to obtain a "compressed DFUDS", and as a consequence, a tree in which every internal node has exactly two children can be represented in n+o(n) bits. We also describe applications of the compressed DFUDS representation to ultra-succinct compressed suffix trees and labeled trees. © 2011 Elsevier Inc. All Rights Reserved.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/394492012-01-01T00:00:00Z
- Rooted maximum agreement supertreeshttps://scholarbank.nus.edu.sg/handle/10635/39450Title: Rooted maximum agreement supertrees
Authors: Jansson, J.; Ng, J.H.-K.; Sadakane, K.; Sung, W.-K.
Abstract: Given a set τ rooted, unordered trees, where each T i ∈ τ is distinctly leaf-labeled by a set Λ(T i) and where the sets Λ(T i) may overlap, the maximum agreement supertree problem(MASP) is to construct a distinctly leaf-labeled tree script Q sign with leaf set Λ(script Q sign) ⊆ ∪ Ti ∈ τ Λ(T i) such that |Λ(script Q sign)| is maximized and for each T i ∈ Τ, the topological restriction of T i to Λ(script Q sign) is isomorphic to the topological restriction of script Q sign to Λ(T i). Let n = |∪ Ti ∈ τ Λ(T i)|, k = |Τ|, and D = max Ti ∈ Τ {deg(T i)}. We first show that MASP with k = 2 can be solved in O(√Dn log (2n/D)) time, which is O(n log n) when D = O(1) and O(n 1.5) when D is unrestricted. We then present an algorithm for MASP with D = 2 whose running time is polynomial if k = O(1). On the other hand, we prove that MASP is NP-hard for any fixed k ≥ 3 when D is unrestricted, and also NP-hard for any fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomial-time (n/log n)-approximation algorithm for MASP. © 2005 Springer Science+Business Media, Inc.
Sat, 01 Jan 2005 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/394502005-01-01T00:00:00Z
- Detection of chromosomal breakpoints in patients with developmental delay and speech disordershttps://scholarbank.nus.edu.sg/handle/10635/142165Title: Detection of chromosomal breakpoints in patients with developmental delay and speech disorders
Authors: Utami K.H.; Hillmer A.M.; Aksoy I.; Chew E.G.Y.; Teo A.S.M.; Zhang Z.; Lee C.W.H.; Chen P.J.; Seng C.C.; Ariyaratne P.N.; Rouam S.L.; Soo L.S.; Yousoof S.; Prokudin I.; Peters G.; Collins F.; Wilson M.; Kakakios A.; Haddad G.; Menuet A.; Perche O.; Tay S.K.H.; Sung K.W.K.; Ruan X.; Ruan Y.; Liu E.T.; Briault S.; Jamieson R.V.; Davila S.; Cacheux V.
Wed, 01 Jan 2014 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1421652014-01-01T00:00:00Z
- Deregulated direct targets of the hepatitis B virus (HBV) protein, HBx, identified through chromatin immunoprecipitation and expression microarray profilinghttps://scholarbank.nus.edu.sg/handle/10635/38989Title: Deregulated direct targets of the hepatitis B virus (HBV) protein, HBx, identified through chromatin immunoprecipitation and expression microarray profiling
Authors: Sung, W.-K.; Lu, Y.; Lee, C.W.H.; Zhang, D.; Ronaghi, M.; Lee, C.G.L.
Abstract: The hepatitis B-X (HBx) protein is strongly associated with hepatocellular carcinoma. It is implicated not to directly cause cancer but to play a role in hepatocellular carcinoma as a co-factor. The oncogenic potential of HBx primarily lies in its interaction with transcriptional regulators resulting in aberrant gene expression and deregulated cellular pathways. Utilizing ultraviolet irradiation to simulate a tumor-initiating event, we integrated chip-based chromatin immunoprecipitation (ChIP-chip) with expression microarray profiling and identified 184 gene targets directly deregulated by HBx. One-hundred forty-four transcription factors interacting with HBx were computationally inferred. We experimentally validated that HBx interacts with some of the predicted transcription factors (pTF) as well as the promoters of the deregulated target genes of these pTFs. Significantly, we demonstrated that the pTF interacts with the promoters of the deregulated HBx target genes and that deregulation by HBx of these HBx target genes carrying the pTF consensus sequences can be reversed using pTF small interfering RNAs. The roles of these deregulated direct HBx target genes and their relevance in cancer was inferred via querying against biogroup/cancer-related microarray databases using web-based NextBio™ software. Six pathways, including the Jak-STAT pathway, were predicted to be significantly deregulated when HBx binds indirectly to direct target gene promoters. In conclusion, this study represents the first ever demonstration of the utilization of ChIP-chip to identify deregulated direct gene targets from indirect protein-DNA binding as well as transcriptional factors directly interacting with HBx. Increased knowledge of the gene/transcriptional factor targets of HBx will enhance our understanding of the role of HBx in hepatocellular carcinogenesis and facilitate the design of better strategies in combating hepatitis B virus-associated hepatocellular carcinoma. © 2009 by The American Society for Biochemistry and Molecular Biology, Inc.
Thu, 01 Jan 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/389892009-01-01T00:00:00Z
- Whole-Genome Mapping of Histone H3 Lys4 and 27 Trimethylations Reveals Distinct Genomic Compartments in Human Embryonic Stem Cellshttps://scholarbank.nus.edu.sg/handle/10635/43116Title: Whole-Genome Mapping of Histone H3 Lys4 and 27 Trimethylations Reveals Distinct Genomic Compartments in Human Embryonic Stem Cells
Authors: Zhao, X.D.; Han, X.; Chew, J.L.; Liu, J.; Chiu, K.P.; Choo, A.; Orlov, Y.L.; Sung, W.-K.; Shahab, A.; Kuznetsov, V.A.; Bourque, G.; Oh, S.; Ruan, Y.; Ng, H.-H.; Wei, C.-L.
Abstract: Epigenetic modifications are crucial for proper lineage specification and embryo development. To explore the chromatin modification landscapes in human ES cells, we profiled two histone modifications, H3K4me3 and H3K27me3, by ChIP coupled with the paired-end ditags sequencing strategy. H3K4me3 was found to be a prevalent mark and occurred in close proximity to the promoters of two-thirds of total human genes. Among the H3K27me3 loci identified, 56% are associated with promoters and the vast majority of them are comodified by H3K4me3. By deep-transcript digital counting, 80% of H3K4me3 and 36% of comodified promoters were found to be transcribed. Remarkably, we observed that different combinations of histone methylations are associated with genes from distinct functional categories. These global histone methylation maps provide an epigenetic framework that enables the discovery of novel transcriptional networks and delineation of different genetic compartments of the pluripotent cell genome. © 2007 Elsevier Inc. All rights reserved.
Mon, 01 Jan 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431162007-01-01T00:00:00Z
- ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencinghttps://scholarbank.nus.edu.sg/handle/10635/28810Title: ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing
Authors: Li, G.; Xu, H.; Mulawadi, F.H.; Velkov, S.; Vega, V.; Ariyaratne, P.N.; Mohamed, Y.B.; Ooi, H.-S.; Sung, W.-K.; Fullwood, M.J.; Wei, C.-L.; Ruan, Y.; Tennakoon, C.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/288102010-01-01T00:00:00Z
- Succinct data structures for searchable partial sums with optimal worst-case performancehttps://scholarbank.nus.edu.sg/handle/10635/39117Title: Succinct data structures for searchable partial sums with optimal worst-case performance
Authors: Hon, W.-K.; Sadakane, K.; Sung, W.-K.
Abstract: The notion of succinct indexes can be dated back from the debut of Jacobson's thesis (1988) [14], and has triggered many results in the last decade. In traditional indexing, some given data are preprocessed so as to support online queries (and updates) on the data as efficiently as possible. When succinctness is involved, we are restricted to index the data using only an informationtheoretically minimum number of bits. This paper concerns the succinct indexing schemes for a well-studied problem called Searchable Partial Sums (SPS). In SPS, an array A of n non-negative k-bit integers is preprocessed so as to support online sum and search queries, and possibly update operation of individual entry. A succinct indexing scheme would allow only kn+o(kn) bits to represent the array A. The only known result is that when k=1 (in this case, it is known as the Dynamic Bit Array Problem), we can support both queries in O(log bn) time and update in O(b) amortized time for any b with lgnlglgn≤b≤n. This paper shows that even for k=O(lglgn), we can index A succinctly such that both query and update operations can be supported using the same time complexities. Moreover, the time for update becomes the worst-case time. Furthermore, the tradeoff between the query times and the update time is optimal as implied by Patracu and Demaine's lower bound result (2006) [24]. In general when k=O(lgU), we show a lower bound of Ω(lgnlglgn) time for the search query irrespective of the update time. This gives a tighter lower bound as compared to that of Patracu and Demaine's, which is a consequence of the requirement of succinctness. On the other hand, we give a succinct index that can support sum in O(log bn) time, search in O(τlog bn) time, and update in O(b) time, where τ=minlglgnlglgUlglglgU,lgnlglgn. The query times are optimal when b= n. This paper also extends the Searchable Partial Sums with insert and delete operations, and provides a succinct data structure for some cases. © 2011 Elsevier B.V. All rights reserved.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391172011-01-01T00:00:00Z
- BatMeth: improved mapper for bisulfite sequencing reads on DNA methylationhttps://scholarbank.nus.edu.sg/handle/10635/77828Title: BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation
Authors: Lim, J.-Q.; Tennakoon, C.; Li, G.; Wong, E.; Ruan, Y.; Wei, C.-L.; Sung, W.-K.
Abstract: DNA methylation plays a crucial role in higher organisms. Coupling bisulfite treatment with next generation sequencing enables the interrogation of 5-methylcytosine sites in the genome. However, bisulfite conversion introduces mismatches between the reads and the reference genome, which makes mapping of Illumina and SOLiD reads slow and inaccurate. BatMeth is an algorithm that integrates novel Mismatch Counting, List Filtering, Mismatch Stage Filtering and Fast Mapping onto Two Indexes components to improve unique mapping rate, speed and precision. Experimental results show that BatMeth is faster and more accurate than existing tools. BatMeth is freely available at http://code.google.com/p/batmeth/. © 2012 Lim et al.; licensee BioMed Central Ltd.
Sat, 10 Mar 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/778282012-03-10T00:00:00Z
- Spectrum-based de novo repeat detection in genomic sequenceshttps://scholarbank.nus.edu.sg/handle/10635/43115Title: Spectrum-based de novo repeat detection in genomic sequences
Authors: Do, H.H.; Choi, K.P.; Preparata, F.P.; Sung, W.K.; Zhang, L.
Abstract: A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg. © Mary Ann Liebert, Inc. 2008.
Tue, 01 Jan 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431152008-01-01T00:00:00Z
- Erratum: AP-2γ; 3 regulates oestrogen receptor-mediated long-range chromatin interaction and gene transcription (EMBO Journal (2011) 30 (2569-2581) DOI: 10.1038/emboj.2011.151)https://scholarbank.nus.edu.sg/handle/10635/42309Title: Erratum: AP-2γ; 3 regulates oestrogen receptor-mediated long-range chromatin interaction and gene transcription (EMBO Journal (2011) 30 (2569-2581) DOI: 10.1038/emboj.2011.151)
Authors: Tan, S.K.; Lin, Z.H.; Chang, C.W.; Varang, V.; Chng, K.R.; Pan, Y.F.; Yong, E.L.; Sung, W.K.; Cheung, E.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/423092011-01-01T00:00:00Z
- Constructing phylogenetic supernetworks based on simulated annealinghttps://scholarbank.nus.edu.sg/handle/10635/39010Title: Constructing phylogenetic supernetworks based on simulated annealing
Authors: Hassanzadeh, R.; Eslahchi, C.; Sung, W.-K.
Abstract: Different partial phylogenetic trees can be derived from different sources of evidence and different methods. One important problem is to summarize these partial phylogenetic trees using a supernetwork. We propose a novel simulated annealing based method called SNSA which uses an optimization function to produce a simple network that still retains a great deal of phylogenetic information. We report the performance of this new method on real and simulated datasets. © 2012 Elsevier Inc.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/390102012-01-01T00:00:00Z
- Whole-genome reconstruction and mutational signatures in gastric cancerhttps://scholarbank.nus.edu.sg/handle/10635/144343Title: Whole-genome reconstruction and mutational signatures in gastric cancer
Authors: Nagarajan N.; Bertrand D.; Hillmer A.M.; Zang Z.J.; Yao F.; Jacques P.; Teo A.S.; Cutcutache I.; Zhang Z.; Lee W.H.; Sia Y.Y.; Gao S.; Ariyaratne P.N.; Ho A.; Woo X.Y.; Veeravali L.; Ong C.K.; Deng N.; Desai K.V.; Khor C.C.; Hibberd M.L.; Shahab A.; Rao J.; Wu M.; Teh M.; Zhu F.; Chin S.Y.; Pang B.; So J.B.; Bourque G.; Soong R.; Sung W.-K.; Tean Teh B.; Rozen S.; Ruan X.; Yeoh K.G.; Tan P.B.; Ruan Y.
Sun, 01 Jan 2012 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1443432012-01-01T00:00:00Z
- Next-generation sequencing of apoptotic DNA breakpoints reveals association with actively transcribed genes and gene translocationshttps://scholarbank.nus.edu.sg/handle/10635/142110Title: Next-generation sequencing of apoptotic DNA breakpoints reveals association with actively transcribed genes and gene translocations
Authors: Fullwood M.J.; Lee J.; Lin L.; Li G.; Huss M.; Ng P.; Sung W.-K.; Shenolikar S.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1421102011-01-01T00:00:00Z
- Adjacent nucleotide dependence in ncRNA and order-1 SCFG for ncRNA identificationhttps://scholarbank.nus.edu.sg/handle/10635/39847Title: Adjacent nucleotide dependence in ncRNA and order-1 SCFG for ncRNA identification
Authors: Wong, T.K.F.; Lam, T.-W.; Sung, W.-K.; Yiu, S.-M.
Abstract: Background: Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable. Methodology: Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs. Conclusions: With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/~kfwong/order1scfg. © 2010 Wong et al.
Fri, 01 Jan 2010 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/398472010-01-01T00:00:00Z
- Jmjd3 contributes to the control of gene expression in LPS-activated macrophageshttps://scholarbank.nus.edu.sg/handle/10635/39119Title: Jmjd3 contributes to the control of gene expression in LPS-activated macrophages
Authors: De Santa, F.; Narang, V.; Yap, Z.H.; Tusi, B.K.; Burgold, T.; Austenaa, L.; Bucci, G.; Caganova, M.; Notarbartolo, S.; Casola, S.; Testa, G.; Sung, W.-K.; Wei, C.-L.; Natoli, G.
Abstract: Jmjd3, a JmjC family histone demethylase, is induced by the transcription factor NF-kB in response to microbial stimuli. Jmjd3 erases H3K27me3, a histone mark associated with transcriptional repression and involved in lineage determination. However, the specific contribution of Jmjd3 induction and H3K27me3 demethylation to inflammatory gene expression remains unknown. Using chromatin immunoprecipitation-sequencing we found that Jmjd3 is preferentially recruited to transcription start sites characterized by high levels of H3K4me3, a marker of gene activity, and RNA polymerase II (Pol-II). Moreover, 70% of lipopolysaccharide (LPS)-inducible genes were found to be Jmjd3 targets. Although most Jmjd3 target genes were unaffected by its deletion, a few hundred genes, including inducible inflammatory genes, showed moderately impaired Pol-II recruitment and transcription. Importantly, most Jmjd3 target genes were not associated with detectable levels of H3K27me3, and transcriptional effects of Jmjd3 absence in the window of time analysed were uncoupled from measurable effects on this histone mark. These data show that Jmjd3 fine-tunes the transcriptional output of LPS-activated macrophages in an H3K27 demethylation-independent manner. © 2009 European Molecular Biology Organization | All Rights Reserved.
Thu, 01 Jan 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391192009-01-01T00:00:00Z
- Detection of generic spaced motifs using submotif pattern mininghttps://scholarbank.nus.edu.sg/handle/10635/39466Title: Detection of generic spaced motifs using submotif pattern mining
Authors: Wijaya, E.; Rajaraman, K.; Yiu, S.-M.; Sung, W.-K.
Abstract: Motivation: Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable. Results: This article proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif-finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools. © The Author 2007. Published by Oxford University Press. All rights reserved.
Mon, 01 Jan 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/394662007-01-01T00:00:00Z
- Chromatin interaction networks and higher order architectures of eukaryotic genomeshttps://scholarbank.nus.edu.sg/handle/10635/43129Title: Chromatin interaction networks and higher order architectures of eukaryotic genomes
Authors: Singh Sandhu, K.; Li, G.; Sung, W.-K.; Ruan, Y.
Abstract: Eukaryotic genome is, not only linearly but also spatially, organized into non-random architecture. Though the linear organization of genes and their epigenetic descriptors are well characterized, the relevance of their spatial organization is beginning to unfold only recently. It is increasingly being recognized that physical interactions among distant genomic elements could serve as an important mean to eukaryotic genome regulation. With the advent of proximity ligation based techniques coupled with next generation sequencing, it is now possible to explore whole genome chromatin interactions at high resolution. Emerging data on genome-wide chromatin interactions suggest that distantly located genes are not independent entities and instead cross-talk with each other in an extensive manner, supporting the notion of "chromatin interaction networks". Moreover, the data also advance the field to "3-dimensional (3D) chromatin structure and dynamics", which would enable molecular biologists to explore the spatiotemporal regulation of genome. In this article, we introduce a stepwise topological transformation of genome from 1-dimension (1D, linear) to 2-dimension (2D, networks) to 3-dimension (3D, architecture) and discuss how such transformations could advance our understanding of genome biology. Copyright © 2011 Wiley-Liss, Inc.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431292011-01-01T00:00:00Z
- Brief overview of bioinformatics activities in Singaporehttps://scholarbank.nus.edu.sg/handle/10635/42331Title: Brief overview of bioinformatics activities in Singapore
Authors: Eisenhaber, F.; Kwoh, C.-K.; Ng, S.-K.; Sung, W.-K.; Wong, L.
Thu, 01 Jan 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/423312009-01-01T00:00:00Z
- Breaking a time-and-space barrier in constructing full-text indiceshttps://scholarbank.nus.edu.sg/handle/10635/39472Title: Breaking a time-and-space barrier in constructing full-text indices
Authors: Hon, W.-K.; Sadakane, K.; Sung, W.-K.
Abstract: Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in O (n) time, while it requires O (n log n)-bit working space, where n denotes the length of the text. On the other hand, the most space-efficient algorithm requires O (n)-bit working space while it runs in O (n log n) time. It was open whether these indices can be constructed in both o (n log n) time and o (n log n)-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array, which takes O (n) time and O (n)-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O (n log∈ n) time and O (n)-bit working space for any 0 < ∈ < 1. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as compressed suflix tree, compressed suffix arrays, and FM-index. We also study the general case where the size of the alphabet S is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal O (n log |Σ|)-bit working space while running in O (n log log |Σ|) time and O (n (log∈ n + log|Σ|)) time, respectively. These are the first algorithms that achieve o (n log n) time with optimal working space. Moreover, for the special case where log |Σ| = O ((log log n)1-∈), we can speed up our suffix array construction algorithm to the optimal O (n). © 2009 Society for Industrial and Applied Mathematics.
Thu, 01 Jan 2009 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/394722009-01-01T00:00:00Z
- Subtree transfer distance for degree-D phylogenieshttps://scholarbank.nus.edu.sg/handle/10635/77927Title: Subtree transfer distance for degree-D phylogenies
Authors: Hon, W.-K.; Lam, T.-W.; Yiu, S.-M.; Kao, M.-Y.; Sung, W.-K.
Abstract: The subtree transfer (STT) distance is one of the distance metric for comparing phylogenies. Previous work on computing the STT distance considered degree-3 trees only. In this paper, we give an approximation algorithm for the STT distance for degree-d trees with arbitrary d and with generalized STT operations. Also, some NP-hardness results related to STT distance are presented. © 2004 World Scientific Publishing Company.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/779272004-01-01T00:00:00Z
- MotifVoter: A novel ensemble method for fine-grained integration of generic motif findershttps://scholarbank.nus.edu.sg/handle/10635/39781Title: MotifVoter: A novel ensemble method for fine-grained integration of generic motif finders
Authors: Wijaya, E.; Yiu, S.-M.; Son, N.T.; Kanagasabai, R.; Sung, W.-K.
Abstract: Motivation: Locating transcription factor binding sites (motifs) is a key step in understanding gene regulation. Based on Tompa's benchmark study, the performance of current de novo motif finders is far from satisfactory (with sensitivity ≤0.222 and precision ≤0.307). The same study also shows that no motif finder performs consistently well over all datasets. Hence, it is not clear which finder one should use for a given dataset. To address this issue, a class of algorithms called ensemble methods have been proposed. Though the existing ensemble methods overall perform better than stand-alone motif finders, the improvement gained is not substantial. Our study reveals that these methods do not fully exploit the information obtained from the results of individual finders, resulting in minor improvement in sensitivity and poor precision. Results: In this article, we identify several key observations on how to utilize the results from individual finders and design a novel ensemble method, MotifVoter, to predict the motifs and binding sites. Evaluations on 186 datasets show that MotifVoter can locate more than 95% of the binding sites found by its component motif finders. In terms of sensitivity and precision, MotifVoter outperforms stand-alone motif finders and ensemble methods significantly on Tompa's benchmark, Escherichia coli, and ChIP-Chip datasets. MotifVoter is available online via a web server with several biologist-friendly features. © The Author 2008. Published by Oxford University Press. All rights reserved.
Tue, 01 Jan 2008 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/397812008-01-01T00:00:00Z
- An efficient strategy for extensive integration of diverse biological data for protein function predictionhttps://scholarbank.nus.edu.sg/handle/10635/39787Title: An efficient strategy for extensive integration of diverse biological data for protein function prediction
Authors: Chua, H.N.; Sung, W.-K.; Wong, L.
Abstract: Motivation: With the increasing availability of diverse biological information, protein function prediction approaches have converged towards integration of heterogeneous data. Many adapted existing techniques, such as machine-learning and probabilistic methods, which have proven successful on specific data types. However, the impact of these approaches is hindered by a couple of factors. First, there is little comparison between existing approaches. This is in part due to a divergence in the focus adopted by different works, which makes comparison difficult or even fuzzy. Second, there seems to be over-emphasis on the use of computationally demanding machine-learning methods, which runs counter to the surge in biological data. Analogous to the success of BLAST for sequence homology search, we believe that the ability to tap escalating quantity, quality and diversity of biological data is crucial to the success of automated function prediction as a useful instrument for the advancement of proteomic research. We address these problems by: (1) providing useful comparison between some prominent methods; (2) proposing Integrated Weighted Averaging (IWA) - a scalable, efficient and flexible function prediction framework that integrates diverse information using simple weighting strategies and a local prediction method. The simplicity of the approach makes it possible to make predictions based on on-the-fly information fusion. Results: In addition to its greater efficiency, IWA performs exceptionally well against existing approaches. In the presence of cross-genome information, which is overwhelming for existing approaches, IWA makes even better predictions. We also demonstrate the significance of appropriate weighting strategies in data integration. © The Author 2007. Published by Oxford University Press. All rights reserved.
Mon, 01 Jan 2007 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/397872007-01-01T00:00:00Z
- Inferring a level-1 phylogenetic network from a dense set of rooted tripletshttps://scholarbank.nus.edu.sg/handle/10635/39312Title: Inferring a level-1 phylogenetic network from a dense set of rooted triplets
Authors: Jansson, J.; Sung, W.-K.
Abstract: We consider the following problem: Given a set T of rooted triplets with leaf set L, determine whether there exists a phylogenetic network consistent with T, and if so, construct one. We show that if no restrictions are placed on the hybrid nodes in the solution, the problem is trivially solved in polynomial time by a simple sorting network-based construction. For the more interesting (and biologically more motivated) case where the solution is required to be a level-1 phylogenetic network, we present an algorithm solving the problem in O (| T |2) time when T is dense, i.e., when T contains at least one rooted triplet for each cardinality three subset of L. We also give an O (| T |5 / 3)-time algorithm for finding the set of all phylogenetic networks having a single hybrid node attached to exactly one leaf (and having no other hybrid nodes) that are consistent with a given dense set of rooted triplets. © 2006 Elsevier B.V. All rights reserved.
Sun, 01 Jan 2006 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/393122006-01-01T00:00:00Z
- Computing a smallest multilabeled phylogenetic tree from rooted tripletshttps://scholarbank.nus.edu.sg/handle/10635/39451Title: Computing a smallest multilabeled phylogenetic tree from rooted triplets
Authors: Guillemot, S.; Jansson, J.; Sung, W.-K.
Abstract: We investigate the computational complexity of inferring a smallest possible multilabeled phylogenetic tree (MUL tree) which is consistent with each of the rooted triplets in a given set. This problem has not been studied previously in the literature. We prove that even the very restricted case of determining if there exists a MUL tree consistent with the input and having just one leaf duplication is an NP-hard problem. Furthermore, we show that the general minimization problem is difficult to approximate, although a simple polynomial-time approximation algorithm achieves an approximation ratio close to our derived inapproximability bound. Finally, we provide an exact algorithm for the problem running in exponential time and space. As a by-product, we also obtain new, strong inapproximability results for two partitioning problems on directed graphs called ACYCLIC PARTITION and ACYCLIC TREE-PARTITION. © 2011 IEEE.
Sat, 01 Jan 2011 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/394512011-01-01T00:00:00Z