Please use this identifier to cite or link to this item:
|Title:||Stylistic and lexical co-training for Web block classification|
Lexical and stylistic learners
Web page block classification
Web page division
|Source:||Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository.|
|Abstract:||Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM.|
|Source Title:||Proceedings of the Interntational Workshop on Web Information and Data Management|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Jan 20, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.