Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/41312
Title: | Stylistic and lexical co-training for Web block classification | Authors: | Lee, C.H. Kan, M.-Y. Lai, S. |
Keywords: | Co-training Lexical and stylistic learners PARCELS Web page block classification Web page division |
Issue Date: | 2004 | Citation: | Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository. | Abstract: | Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM. | Source Title: | Proceedings of the Interntational Workshop on Web Information and Data Management | URI: | http://scholarbank.nus.edu.sg/handle/10635/41312 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.