Please use this identifier to cite or link to this item:
Title: Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness
Authors: Yiu, S.M.
Chan, P.Y.
Lam, T.W.
Sung, W.K. 
Ting, H.F.
Wong, P.W.H.
Issue Date: 2005
Source: Yiu, S.M.,Chan, P.Y.,Lam, T.W.,Sung, W.K.,Ting, H.F.,Wong, P.W.H. (2005). Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness. Series on Advances in Bioinformatics and Computational Biology 1 : 1-10. ScholarBank@NUS Repository.
Abstract: Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.
Source Title: Series on Advances in Bioinformatics and Computational Biology
ISBN: 1860944779
ISSN: 17516404
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Dec 9, 2017

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.