Please use this identifier to cite or link to this item: https://doi.org/10.1145/2382936.2382985
Title: Alignment seeding strategies using contiguous pyrimidine purine matches
Authors: Hou, M.
Zhang, L. 
Harris, R.S.
Keywords: Alignment
Genomic sequence
Matches
Model
Seeding
Issue Date: 2012
Citation: Hou, M.,Zhang, L.,Harris, R.S. (2012). Alignment seeding strategies using contiguous pyrimidine purine matches. 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 : 384-391. ScholarBank@NUS Repository. https://doi.org/10.1145/2382936.2382985
Abstract: Large-scale genomic pairwise aligners usually start with a seeding procedure, which scans two sequences to obtain base matches (called hits) that follow a certain pattern (called a seed). The seed pattern and size determine the sensitivity and specificity of the seeding procedure and greatly affect the alignment accuracy and computational efficiency. Much effort has been focused on obtaining an optimal (set of) spaced seed(s) to improve sensitivity. However, specificity also becomes a big concern when aligning very long genomic sequences. We present a seeding strategy that identifies contiguous pyrimidine purine (py · pu) matches. This model may improve sensitivity and specificity simultaneously compared to a contiguous base match model. We further present a seeding strategy that identifies contiguous py · pu matches with at least a certain number of contiguous base matches. This model significantly improves sensitivity and specificity simultaneously compared to the base match model. It can also achieve better sensitivity than an optimal spaced seed without loss of specificity, when the ratio of transition to transversion is high. Our examination on the CFTR region of 2M bases between human and mouse shows that this new model can have very high specificity without much loss of sensitivity compared to an optimal spaced seed. Based on the characteristics (e.g. the sequence similarity, the ratio between transition and transversion, and the lengths of gapless alignments) of alignments between human and other mammals, the new seeding strategies are promising in improving alignment quality of a wide selection of species pairs. This paper also lays the groundwork for future advancement of applying spaced patterns in these seeding strategies Copyright © 2012 ACM.
Source Title: 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
URI: http://scholarbank.nus.edu.sg/handle/10635/104530
ISBN: 9781450316705
DOI: 10.1145/2382936.2382985
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.