Please use this identifier to cite or link to this item:
https://doi.org/10.1007/s00453-007-9104-8
Title: | Improved approximate string matching using compressed suffix data structures | Authors: | Lam, T.-W. Sung, W.-K. Wong, S.-S. |
Issue Date: | 2008 | Citation: | Lam, T.-W., Sung, W.-K., Wong, S.-S. (2008). Improved approximate string matching using compressed suffix data structures. Algorithmica (New York) 51 (3) : 298-314. ScholarBank@NUS Repository. https://doi.org/10.1007/s00453-007-9104-8 | Abstract: | Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log 2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k m k (k + log log n) + occ) and O(log ε n(|A| k m k (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC. | Source Title: | Algorithmica (New York) | URI: | http://scholarbank.nus.edu.sg/handle/10635/39928 | ISSN: | 01784617 | DOI: | 10.1007/s00453-007-9104-8 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.