Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/39401
Title: Innovating web page classification through reducing noise
Authors: Li, X. 
Shi, Z.
Keywords: Classification algorithm without noise
Similarity measure
Web page classification
Issue Date: 2002
Citation: Li, X., Shi, Z. (2002). Innovating web page classification through reducing noise. Journal of Computer Science and Technology 17 (1) : 9-17. ScholarBank@NUS Repository.
Abstract: This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones. We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, we can classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy.
Source Title: Journal of Computer Science and Technology
URI: http://scholarbank.nus.edu.sg/handle/10635/39401
ISSN: 10009000
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.