Please use this identifier to cite or link to this item:
Title: Web based linkage
Authors: Elmacioglu, E.
Kan, M.-Y. 
Lee, D.
Zhang, Y.
Keywords: Entity resolution
Record linkage
Issue Date: 2007
Citation: Elmacioglu, E.,Kan, M.-Y.,Lee, D.,Zhang, Y. (2007). Web based linkage. International Conference on Information and Knowledge Management, Proceedings : 121-128. ScholarBank@NUS Repository.
Abstract: When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward this problem, we propose a novel approach that uses the Web as the collective knowledge source in addition to contents of entities. Our hypothesis is that if an entity e1 is a duplicate of another entity e 2, and if e1 frequently appears together with information I on the Web, then e 2 may appear frequently with I on the Web. By using search engines, we analyze the frequency, URLs, or contents of the returned web pages to capture the information I of an entity. Extensive experiments verify that our hypothesis holds in many real settings, and the idea of using the Web as the additional source for the linkage problem is promising. Our proposal shows 51% (on average) and 193% (at best) improvement in precision/recall compared to a baseline approach. Copyright 2007 ACM.
Source Title: International Conference on Information and Knowledge Management, Proceedings
ISBN: 9781595938299
DOI: 10.1145/1316902.1316922
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Dec 10, 2018

Page view(s)

checked on Dec 8, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.