Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/42144

DC Field	Value
dc.title	Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness
dc.contributor.author	Yiu, S.M.
dc.contributor.author	Chan, P.Y.
dc.contributor.author	Lam, T.W.
dc.contributor.author	Sung, W.K.
dc.contributor.author	Ting, H.F.
dc.contributor.author	Wong, P.W.H.
dc.date.accessioned	2013-07-04T08:44:28Z
dc.date.available	2013-07-04T08:44:28Z
dc.date.issued	2005
dc.identifier.citation	Yiu, S.M.,Chan, P.Y.,Lam, T.W.,Sung, W.K.,Ting, H.F.,Wong, P.W.H. (2005). Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness. Series on Advances in Bioinformatics and Computational Biology 1 : 1-10. ScholarBank@NUS Repository.
dc.identifier.isbn	1860944779
dc.identifier.issn	17516404
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/42144
dc.description.abstract	Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.sourcetitle	Series on Advances in Bioinformatics and Computational Biology
dc.description.volume	1
dc.description.page	1-10
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM