Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/39303
Title: Clustering web pages about persons and organizations
Authors: Ye, S. 
Chua, T.-S. 
Kei, J.R. 
Keywords: Information retrieval
Machine learning
Named entity
Persons and organizations
Text classification
Web clustering
Issue Date: 2005
Source: Ye, S.,Chua, T.-S.,Kei, J.R. (2005). Clustering web pages about persons and organizations. Web Intelligence and Agent Systems 3 (4) : 203-216. ScholarBank@NUS Repository.
Abstract: One of the most frequent Web surfing tasks is to search for persons and organizations by their names. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several named target entities. This paper describes a new methodology to cluster web pages returned by a search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, and link-based and structure-based information as features to partition the document set into direct and indirect pages by means of a decision-tree model. It then chooses the appropriate distinctive direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for web-based information retrieval applications. © 2005-IOS Press and the authors. All rights reserved.
Source Title: Web Intelligence and Agent Systems
URI: http://scholarbank.nus.edu.sg/handle/10635/39303
ISSN: 15701263
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

46
checked on Dec 15, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.