Publication

String matching and indexing with suffix data structures

WONG SWEE SEONG
Citations
Altmetric:
Alternative Title
Abstract
This thesis studies methods for indexing a text so that the occurrences of any given query string in the text can be located efficiently. An occurrence or match may be imprecise, allowing some deviations from the actual query. This gives rise to a family of interesting string matching problems like exact and approximate string matching, and sequence alignment. We consider two different computing models to handle the problem. The first is to compress the index so that it is small enough to be stored in the main memory. Another computing model is to make use of secondary disk, where the index resides on the hard disk. Blocks or chunks of the index are fetched into memory upon request. In this case, we are concern with the number of IO accesses to perform string search on the index. In both scenarios, it is essential to have efficient computation algorithms to support various string search. Mixed computing model is also possible with multiple levels of indexing, combining both in-memory and disk-based indices.
Keywords
suffix tree, suffix array, compressed suffix data structures, string indexing, string matching, sequence alignment search
Source Title
Publisher
Series/Report No.
Organizational Units
Organizational Unit
COMPUTER SCIENCE
dept
Rights
Date
2008-02-07
DOI
Type
Thesis
Additional Links
Related Datasets
Related Publications