Please use this identifier to cite or link to this item: https://doi.org/10.1145/3331184.3331254
DC FieldValue
dc.titlePersonalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation
dc.contributor.authorXu Chen
dc.contributor.authorHanxiong Chen
dc.contributor.authorHongteng Xu
dc.contributor.authorYongfeng Zhang
dc.contributor.authorYixin Cao
dc.contributor.authorZheng Qin
dc.contributor.authorHongyuan Zha
dc.date.accessioned2020-05-06T04:14:27Z
dc.date.available2020-05-06T04:14:27Z
dc.date.issued2019-07-21
dc.identifier.citationXu Chen, Hanxiong Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, Hongyuan Zha (2019-07-21). Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation. SIGIR 2019 : 765-774. ScholarBank@NUS Repository. https://doi.org/10.1145/3331184.3331254
dc.identifier.isbn9781450361729
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/167767
dc.description.abstractFashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion recommendation based on both image region-level features and user review information. Our basic intuition is that: for a fashion image, not all the regions are equally important for the users, i.e., people usually care about a few parts of the fashion image. To model such human sense, we learn an attention model over many pre-segmented image regions, based on which we can understand where a user is really interested in on the image, and correspondingly, represent the image in a more accurate manner. In addition, by discovering such fine-grained visual preference, we can visually explain a recommendation by highlighting some regions of its image. For better learning the attention model, we also introduce user review information as a weak supervision signal to collect more comprehensive user preference. In our final framework, the visual and textual features are seamlessly coupled by a multimodal attention network. Based on this architecture, we can not only provide accurate recommendation, but also can accompany each recommended item with novel visual explanations. We conduct extensive experiments to demonstrate the superiority of our proposed model in terms of Top-N recommendation, and also we build a collectively labeled dataset for evaluating our provided visual explanations in a quantitative manner. ? 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
dc.typeConference Paper
dc.contributor.departmentDEPARTMENT OF COMPUTER SCIENCE
dc.description.doi10.1145/3331184.3331254
dc.description.sourcetitleSIGIR 2019
dc.description.page765-774
dc.grant.idR-252-300-002-490
dc.grant.fundingagencyInfocomm Media Development Authority
dc.grant.fundingagencyNational Research Foundation
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
3331184.3331254.pdf4.03 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.