Please use this identifier to cite or link to this item:
Title: Stylistic and lexical co-training for Web block classification
Authors: Lee, C.H.
Kan, M.-Y. 
Lai, S.
Keywords: Co-training
Lexical and stylistic learners
Web page block classification
Web page division
Issue Date: 2004
Citation: Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository.
Abstract: Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM.
Source Title: Proceedings of the Interntational Workshop on Web Information and Data Management
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Jul 9, 2021

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.