Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41312
DC FieldValue
dc.titleStylistic and lexical co-training for Web block classification
dc.contributor.authorLee, C.H.
dc.contributor.authorKan, M.-Y.
dc.contributor.authorLai, S.
dc.date.accessioned2013-07-04T08:24:34Z
dc.date.available2013-07-04T08:24:34Z
dc.date.issued2004
dc.identifier.citationLee, C.H.,Kan, M.-Y.,Lai, S. (2004). Stylistic and lexical co-training for Web block classification. Proceedings of the Interntational Workshop on Web Information and Data Management : 136-143. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/41312
dc.description.abstractMany applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems. Copyright 2004 ACM.
dc.sourceScopus
dc.subjectCo-training
dc.subjectLexical and stylistic learners
dc.subjectPARCELS
dc.subjectWeb page block classification
dc.subjectWeb page division
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.sourcetitleProceedings of the Interntational Workshop on Web Information and Data Management
dc.description.page136-143
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.