Indexing high-dimensional data for efficient in-memory similarity search

Please use this identifier to cite or link to this item: https://doi.org/10.1109/TKDE.2005.46

DC Field	Value
dc.title	Indexing high-dimensional data for efficient in-memory similarity search
dc.contributor.author	Cui, B.
dc.contributor.author	Ooi, B.C.
dc.contributor.author	Su, U.
dc.contributor.author	Tan, K.-L.
dc.date.accessioned	2013-07-04T07:35:38Z
dc.date.available	2013-07-04T07:35:38Z
dc.date.issued	2005
dc.identifier.citation	Cui, B., Ooi, B.C., Su, U., Tan, K.-L. (2005). Indexing high-dimensional data for efficient in-memory similarity search. IEEE Transactions on Knowledge and Data Engineering 17 (3) : 339-352. ScholarBank@NUS Repository. https://doi.org/10.1109/TKDE.2005.46
dc.identifier.issn	10414347
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/39174
dc.description.abstract	In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., > 32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. In this paper, we present a novel index structure, called A-tree, to speed up the high-dimensional query in main memory environment. The Δ-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using Principal Component Analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called Δ 1-tree, that globally clusters the data space and then partitions clusters into small regions. The Δ +- tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the Δ +-tree is superior in most cases. © 2005 IEEE.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/TKDE.2005.46
dc.source	Scopus
dc.subject	High-dimensional index
dc.subject	Main memory
dc.subject	Similarity query
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.department	SINGAPORE-MIT ALLIANCE
dc.description.doi	10.1109/TKDE.2005.46
dc.description.sourcetitle	IEEE Transactions on Knowledge and Data Engineering
dc.description.volume	17
dc.description.issue	3
dc.description.page	339-352
dc.description.coden	ITKEE
dc.identifier.isiut	000226358200004
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM