Please use this identifier to cite or link to this item:
Title: Compressed indexing data structures for biological sequences
Keywords: indexing, compression, data structure, algorithm, biological sequences, similarity
Issue Date: 29-Nov-2012
Source: DO HUY HOANG (2012-11-29). Compressed indexing data structures for biological sequences. ScholarBank@NUS Repository.
Abstract: As more and more information are generated in the text format from sources like biological research and the internet, the problem of storing and searching within text collections (a.k.a. text indexing) becomes more and more important. This thesis introduces three different indexing data structures which focus on utilizing the similarity characteristic in biological sequences to improve performance and reduce space usage. The first topic is about a compressed version of directed acyclic word graph and its application in similarity measurement. This data structure can be considered as a variant of suffix tree; however, its succinct structure helps to improve the time complexity of local alignment measurement. The second topic introduced a data structure to support the rank and select queries in the delta compressed sequences. Based on this result, we built an indexing data structure for delta compressed sequences called multi-version FM-index. The third topic is an index for a compression scheme called relative Lempel-Ziv. Our structure offers two trade-offs between the index space and the query time: the first structure has asymptotically optimal space; the second one has nearly optimal time.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
DoHH.pdf3.59 MBAdobe PDF



Page view(s)

checked on Dec 11, 2017


checked on Dec 11, 2017

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.