Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/42144
DC FieldValue
dc.titleAllowing mismatches in anchors forwhole genome alignment: Generation and effectiveness
dc.contributor.authorYiu, S.M.
dc.contributor.authorChan, P.Y.
dc.contributor.authorLam, T.W.
dc.contributor.authorSung, W.K.
dc.contributor.authorTing, H.F.
dc.contributor.authorWong, P.W.H.
dc.date.accessioned2013-07-04T08:44:28Z
dc.date.available2013-07-04T08:44:28Z
dc.date.issued2005
dc.identifier.citationYiu, S.M.,Chan, P.Y.,Lam, T.W.,Sung, W.K.,Ting, H.F.,Wong, P.W.H. (2005). Allowing mismatches in anchors forwhole genome alignment: Generation and effectiveness. Series on Advances in Bioinformatics and Computational Biology 1 : 1-10. ScholarBank@NUS Repository.
dc.identifier.isbn1860944779
dc.identifier.issn17516404
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/42144
dc.description.abstractRecent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.
dc.sourceScopus
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.sourcetitleSeries on Advances in Bioinformatics and Computational Biology
dc.description.volume1
dc.description.page1-10
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.