Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/39303
Title: Clustering web pages about persons and organizations
Authors: Ye, S. 
Chua, T.-S. 
Kei, J.R. 
Keywords: Information retrieval
Machine learning
Named entity
Persons and organizations
Text classification
Web clustering
Issue Date: 2005
Citation: Ye, S.,Chua, T.-S.,Kei, J.R. (2005). Clustering web pages about persons and organizations. Web Intelligence and Agent Systems 3 (4) : 203-216. ScholarBank@NUS Repository.
Abstract: One of the most frequent Web surfing tasks is to search for persons and organizations by their names. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several named target entities. This paper describes a new methodology to cluster web pages returned by a search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, and link-based and structure-based information as features to partition the document set into direct and indirect pages by means of a decision-tree model. It then chooses the appropriate distinctive direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for web-based information retrieval applications. © 2005-IOS Press and the authors. All rights reserved.
Source Title: Web Intelligence and Agent Systems
URI: http://scholarbank.nus.edu.sg/handle/10635/39303
ISSN: 15701263
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.