Compressed indexing data structures for biological sequences | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/38748

Title:	Compressed indexing data structures for biological sequences
Authors:	DO HUY HOANG
Keywords:	indexing, compression, data structure, algorithm, biological sequences, similarity
Issue Date:	29-Nov-2012
Citation:	DO HUY HOANG (2012-11-29). Compressed indexing data structures for biological sequences. ScholarBank@NUS Repository.
Abstract:	As more and more information are generated in the text format from sources like biological research and the internet, the problem of storing and searching within text collections (a.k.a. text indexing) becomes more and more important. This thesis introduces three different indexing data structures which focus on utilizing the similarity characteristic in biological sequences to improve performance and reduce space usage. The first topic is about a compressed version of directed acyclic word graph and its application in similarity measurement. This data structure can be considered as a variant of suffix tree; however, its succinct structure helps to improve the time complexity of local alignment measurement. The second topic introduced a data structure to support the rank and select queries in the delta compressed sequences. Based on this result, we built an indexing data structure for delta compressed sequences called multi-version FM-index. The third topic is an index for a compression scheme called relative Lempel-Ziv. Our structure offers two trade-offs between the index space and the query time: the first structure has asymptotically optimal space; the second one has nearly optimal time.
URI:	http://scholarbank.nus.edu.sg/handle/10635/38748
Appears in Collections:	Ph.D Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
DoHH.pdf		3.59 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.