Indexing similar DNA sequences

Please use this identifier to cite or link to this item: https://doi.org/10.1007/978-3-642-14355-7_19

DC Field	Value
dc.title	Indexing similar DNA sequences
dc.contributor.author	Huang, S.
dc.contributor.author	Lam, T.W.
dc.contributor.author	Sung, W.K.
dc.contributor.author	Tam, S.L.
dc.contributor.author	Yiu, S.M.
dc.date.accessioned	2013-07-04T08:14:59Z
dc.date.available	2013-07-04T08:14:59Z
dc.date.issued	2010
dc.identifier.citation	Huang, S.,Lam, T.W.,Sung, W.K.,Tam, S.L.,Yiu, S.M. (2010). Indexing similar DNA sequences. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6124 LNCS : 180-190. ScholarBank@NUS Repository. <a href="https://doi.org/10.1007/978-3-642-14355-7_19" target="_blank">https://doi.org/10.1007/978-3-642-14355-7_19</a>
dc.identifier.isbn	3642143547
dc.identifier.issn	03029743
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40901
dc.description.abstract	To study the genetic variations of a species, one basic operation is to search for occurrences of patterns in a large number of very similar genomic sequences. To build an indexing data structure on the concatenation of all sequences may require a lot of memory. In this paper, we propose a new scheme to index highly similar sequences by taking advantage of the similarity among the sequences. To store r sequences with k common segments, our index requires only O(n + N logN) bits of memory, where n is the total length of the common segments and N is the total length of the distinct regions in all texts. The total length of all sequences is rn + N, and any scheme to store these sequences requires Ω(n + N) bits. Searching for a pattern P of length m takes O(m+mlogN +mlog(rk)psc(P)+occ log n), where psc(P) is the number of prefixes of P that appear as a suffix of some common segments and occ is the number of occurrences of P in all sequences. In practice, rk ≤ N, and psc(P) is usually a small constant. We have implemented our solution1 and evaluated our solution using real DNA sequences. The experiments show that the memory requirement of our solution is much less than that required by BWT built on the concatenation of all sequences. When compared to the other existing solution (RLCSA), we use less memory with faster searching time. © Springer-Verlag Berlin Heidelberg 2010.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1007/978-3-642-14355-7_19
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1007/978-3-642-14355-7_19
dc.description.sourcetitle	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.description.volume	6124 LNCS
dc.description.page	180-190
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM