Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/41312
Title: Stylistic and lexical co-training for Web block classification
Authors: Lee, C.H.
Kan, M.-Y. 
Lai, S.
Keywords: Co-training
Lexical and stylistic learners
PARCELS
Web page block classification
Web page division
Issue Date: 2004
Source: Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository.
Abstract: Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM.
Source Title: Proceedings of the Interntational Workshop on Web Information and Data Management
URI: http://scholarbank.nus.edu.sg/handle/10635/41312
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

57
checked on Jan 20, 2018

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.