Please use this identifier to cite or link to this item: https://doi.org/10.1108/17440080680000102
Title: Clustering Web documents using co-citation, coupling, incoming, and outgoing hyperlinks: A comparative performance analysis of algorithms
Authors: Wijaya, D.T.
Bressan, S. 
Keywords: Clustering
Co-citation
Coupling
Hyperlinks
Search engines
Issue Date: 2006
Citation: Wijaya, D.T., Bressan, S. (2006). Clustering Web documents using co-citation, coupling, incoming, and outgoing hyperlinks: A comparative performance analysis of algorithms. International Journal of Web Information Systems 2 (2) : 69-76. ScholarBank@NUS Repository. https://doi.org/10.1108/17440080680000102
Abstract: Querying search engines with the keyword "jaguars" returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more search engines offer options to organize query results by categories or, given a document, to return a list of links to topically related documents. While information retrieval traditionally defines similarity of documents in terms of contents, it seems natural to expect that the very structure of the Web carries important information about the topical similarity of documents. Here we study the role of a matrix constructed from weighted co-citations (documents referenced by the same document), weighted couplings (documents referencing the same document), incoming, and outgoing links for the clustering of documents on the Web. We present and discuss three methods of clustering based on this matrix construction using three clustering algorithms, K-means, Markov and Maximum Spanning Tree, respectively. Our main contribution is a clustering technique based on the Maximum Spanning Tree technique and an evaluation of its effectiveness comparatively to the two most robust alternatives: K-means and Markov clustering. © TROUBADOR PUBLISHING LTD.
Source Title: International Journal of Web Information Systems
URI: http://scholarbank.nus.edu.sg/handle/10635/77831
ISSN: 17440084
DOI: 10.1108/17440080680000102
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

6
checked on Jun 21, 2019

WEB OF SCIENCETM
Citations

4
checked on Jun 21, 2019

Page view(s)

71
checked on May 24, 2019

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.