Please use this identifier to cite or link to this item: https://doi.org/10.1093/bioinformatics/btn032
Title: Compressed indexing and local alignment of DNA
Authors: Lam, T.W.
Sung, W.K. 
Tam, S.L.
Wong, C.K.
Yiu, S.M.
Issue Date: 2008
Source: Lam, T.W., Sung, W.K., Tam, S.L., Wong, C.K., Yiu, S.M. (2008). Compressed indexing and local alignment of DNA. Bioinformatics 24 (6) : 791-797. ScholarBank@NUS Repository. https://doi.org/10.1093/bioinformatics/btn032
Abstract: Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O ( T P ) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O ( T 0.628 P ) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). © The Author 2008. Published by Oxford University Press. All rights reserved.
Source Title: Bioinformatics
URI: http://scholarbank.nus.edu.sg/handle/10635/39850
ISSN: 13674803
DOI: 10.1093/bioinformatics/btn032
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

71
checked on Dec 13, 2017

WEB OF SCIENCETM
Citations

55
checked on Dec 13, 2017

Page view(s)

37
checked on Dec 9, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.