Please use this identifier to cite or link to this item: https://doi.org/10.1007/s00453-007-9104-8
Title: Improved approximate string matching using compressed suffix data structures
Authors: Lam, T.-W.
Sung, W.-K. 
Wong, S.-S. 
Issue Date: 2008
Citation: Lam, T.-W., Sung, W.-K., Wong, S.-S. (2008). Improved approximate string matching using compressed suffix data structures. Algorithmica (New York) 51 (3) : 298-314. ScholarBank@NUS Repository. https://doi.org/10.1007/s00453-007-9104-8
Abstract: Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log 2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k m k (k + log log n) + occ) and O(log ε n(|A| k m k (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC.
Source Title: Algorithmica (New York)
URI: http://scholarbank.nus.edu.sg/handle/10635/39928
ISSN: 01784617
DOI: 10.1007/s00453-007-9104-8
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.