Please use this identifier to cite or link to this item:
|Title:||Bed-tree: An all-purpose index structure for string similarity search based on edit distance|
|Authors:||Zhang, Z. |
|Source:||Zhang, Z.,Hadjieleftheriou, M.,Ooi, B.C.,Srivastava, D. (2010). Bed-tree: An all-purpose index structure for string similarity search based on edit distance. Proceedings of the ACM SIGMOD International Conference on Management of Data : 915-926. ScholarBank@NUS Repository. https://doi.org/10.1145/1807167.1807266|
|Abstract:||Strings are ubiquitous in computer systems and hence string processing has attracted extensive research effort from computer scientists in diverse areas. One of the most important problems in string processing is to efficiently evaluate the similarity between two strings based on a specified similarity measure. String similarity search is a fundamental problem in information retrieval, database cleaning, biological sequence analysis, and more. While a large number of dissimilarity measures on strings have been proposed, edit distance is the most popular choice in a wide spectrum of applications. Existing indexing techniques for similarity search queries based on edit distance, e.g., approximate selection and join queries, rely mostly on n-gram signatures coupled with inverted list structures. These techniques are tailored for specific query types only, and their performance remains unsatisfactory especially in scenarios with strict memory constraints or frequent data updates. In this paper we propose the Bed-tree, a B+-tree based index structure for evaluating all types of similarity queries on edit distance and normalized edit distance. We identify the necessary properties of a mapping from the string space to the integer space for supporting searching and pruning for these queries. Three transformations are proposed that capture different aspects of information inherent in strings, enabling efficient pruning during the search process on the tree. Compared to state-of-the-art methods on string similarity search, the Bed-tree is a complete solution that meets the requirements of all applications, providing high scalability and fast response time. © 2010 ACM.|
|Source Title:||Proceedings of the ACM SIGMOD International Conference on Management of Data|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Jan 23, 2018
checked on Jan 20, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.