Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/39401
Title: | Innovating web page classification through reducing noise | Authors: | Li, X. Shi, Z. |
Keywords: | Classification algorithm without noise Similarity measure Web page classification |
Issue Date: | 2002 | Citation: | Li, X., Shi, Z. (2002). Innovating web page classification through reducing noise. Journal of Computer Science and Technology 17 (1) : 9-17. ScholarBank@NUS Repository. | Abstract: | This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones. We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, we can classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy. | Source Title: | Journal of Computer Science and Technology | URI: | http://scholarbank.nus.edu.sg/handle/10635/39401 | ISSN: | 10009000 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.