Please use this identifier to cite or link to this item:
https://doi.org/10.1007/s00778-011-0258-2
Title: | Efficient and effective similarity search over probabilistic data based on Earth Mover's Distance | Authors: | Xu, J. Zhang, Z. Tung, A.K.H. Yu, G. |
Keywords: | Earth mover's distance Probabilistic data management Similarity search Tree-based indexing |
Issue Date: | 2012 | Citation: | Xu, J., Zhang, Z., Tung, A.K.H., Yu, G. (2012). Efficient and effective similarity search over probabilistic data based on Earth Mover's Distance. VLDB Journal 21 (4) : 535-559. ScholarBank@NUS Repository. https://doi.org/10.1007/s00778-011-0258-2 | Abstract: | Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover's Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B + trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management. © 2011 Springer-Verlag. | Source Title: | VLDB Journal | URI: | http://scholarbank.nus.edu.sg/handle/10635/39870 | ISSN: | 10668888 | DOI: | 10.1007/s00778-011-0258-2 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.