Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/41312
DC Field | Value | |
---|---|---|
dc.title | Stylistic and lexical co-training for Web block classification | |
dc.contributor.author | Lee, C.H. | |
dc.contributor.author | Kan, M.-Y. | |
dc.contributor.author | Lai, S. | |
dc.date.accessioned | 2013-07-04T08:24:34Z | |
dc.date.available | 2013-07-04T08:24:34Z | |
dc.date.issued | 2004 | |
dc.identifier.citation | Lee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository. | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/41312 | |
dc.description.abstract | Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM. | |
dc.source | Scopus | |
dc.subject | Co-training | |
dc.subject | Lexical and stylistic learners | |
dc.subject | PARCELS | |
dc.subject | Web page block classification | |
dc.subject | Web page division | |
dc.type | Conference Paper | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.description.sourcetitle | Proceedings of the Interntational Workshop on Web Information and Data Management | |
dc.description.page | 136-143 | |
dc.identifier.isiut | NOT_IN_WOS | |
Appears in Collections: | Staff Publications |
Show simple item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.