Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41312
Title: Stylistic and lexical co-training for Web block classification
Authors: Lee, C.H.
Kan, M.-Y. 
Lai, S.
Keywords: Co-training
Lexical and stylistic learners
PARCELS
Web page block classification
Web page division
Issue Date: 2004
Citation: Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository.
Abstract: Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM.
Source Title: Proceedings of the Interntational Workshop on Web Information and Data Management
URI: http://scholarbank.nus.edu.sg/handle/10635/41312
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.