Please use this identifier to cite or link to this item:
|Title:||Indexing mixed types for approximate retrieval|
|Source:||Jin, L.,Li, C.,Koudas, N.,Tung, A.K.H. (2005). Indexing mixed types for approximate retrieval. VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases 2 : 793-804. ScholarBank@NUS Repository.|
|Abstract:||In various applications such as data cleansing, being able to retrieve categorical or numerical attributes based on notions of approximate match (e.g., edit distance, numerical distance) is of profound importance. Commonly, approximate match predicates are specified on combinations of attributes in conjunction. Existing database techniques for approximate retrieval, however, limit their applicability to single attribute retrieval through B-trees and their variants. In this paper, we propose a methodology that utilizes known multidimensional indexing structures for the problem of approximate multi-attribute retrieval. Our method enables indexing of a collection of string and/or numeric attributes to facilitate approximate retrieval using edit distance as an approximate match predicate for strings and numeric distance for numeric attributes. The approach presented is based on representing sets of strings at higher levels of the index structure as tries suitably compressed in a way that reasoning about edit distance between a query string and a compressed trie at index nodes is still feasible. We propose and evaluate various techniques to generate the compressed trie representation and fully specify our indexing methodology. Our experimental results show the benefits of our proposal when compared with various alternate strategies for the same problem.|
|Source Title:||VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Jan 19, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.