Please use this identifier to cite or link to this item:
Title: Bed-tree: An all-purpose index structure for string similarity search based on edit distance
Authors: Zhang, Z. 
Hadjieleftheriou, M.
Ooi, B.C. 
Srivastava, D.
Keywords: approximate join
edit distance
range query
similarity search
top-k query
Issue Date: 2010
Citation: Zhang, Z.,Hadjieleftheriou, M.,Ooi, B.C.,Srivastava, D. (2010). Bed-tree: An all-purpose index structure for string similarity search based on edit distance. Proceedings of the ACM SIGMOD International Conference on Management of Data : 915-926. ScholarBank@NUS Repository.
Abstract: Strings are ubiquitous in computer systems and hence string processing has attracted extensive research effort from computer scientists in diverse areas. One of the most important problems in string processing is to efficiently evaluate the similarity between two strings based on a specified similarity measure. String similarity search is a fundamental problem in information retrieval, database cleaning, biological sequence analysis, and more. While a large number of dissimilarity measures on strings have been proposed, edit distance is the most popular choice in a wide spectrum of applications. Existing indexing techniques for similarity search queries based on edit distance, e.g., approximate selection and join queries, rely mostly on n-gram signatures coupled with inverted list structures. These techniques are tailored for specific query types only, and their performance remains unsatisfactory especially in scenarios with strict memory constraints or frequent data updates. In this paper we propose the Bed-tree, a B+-tree based index structure for evaluating all types of similarity queries on edit distance and normalized edit distance. We identify the necessary properties of a mapping from the string space to the integer space for supporting searching and pruning for these queries. Three transformations are proposed that capture different aspects of information inherent in strings, enabling efficient pruning during the search process on the tree. Compared to state-of-the-art methods on string similarity search, the Bed-tree is a complete solution that meets the requirements of all applications, providing high scalability and fast response time. © 2010 ACM.
Source Title: Proceedings of the ACM SIGMOD International Conference on Management of Data
ISBN: 9781450300322
ISSN: 07308078
DOI: 10.1145/1807167.1807266
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Oct 11, 2019

Page view(s)

checked on Oct 14, 2019

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.