Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41605
DC FieldValue
dc.titleLearning to separate text content and style for classification
dc.contributor.authorZhang, D.
dc.contributor.authorLee, W.S.
dc.date.accessioned2013-07-04T08:31:26Z
dc.date.available2013-07-04T08:31:26Z
dc.date.issued2006
dc.identifier.citationZhang, D.,Lee, W.S. (2006). Learning to separate text content and style for classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4182 LNCS : 79-91. ScholarBank@NUS Repository.
dc.identifier.isbn3540457801
dc.identifier.issn03029743
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/41605
dc.description.abstractMany text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "faculty", or according the source universities, such as "Cornell" or "Texas". We call one kind of labels the content and the other kind the style. Given a set of documents, each with both content and style labels, we seek to effectively learn to classify a set of documents in a new style with no content labels into its content classes. Assuming that every document is generated using words drawn from a mixture of two multinomial component models, one content model and one style model, we propose a method named Cartesian EM that constructs content models and style models through Expectation Maximization and performs classification of the unknown content classes transductively. Our experiments on real-world datasets show the proposed method to be effective for style independent text content classification. © Springer-Verlag Berlin Heidelberg 2006.
dc.sourceScopus
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.sourcetitleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.description.volume4182 LNCS
dc.description.page79-91
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.