Knowledge-aware Multimodal Dialog Systems

Please use this identifier to cite or link to this item: https://doi.org/10.1145/3240508.3240605

DC Field	Value
dc.title	Knowledge-aware Multimodal Dialog Systems
dc.contributor.author	Lizi Liao
dc.contributor.author	Yunshan Ma
dc.contributor.author	Xiangnan He
dc.contributor.author	Richang Hong
dc.contributor.author	Tat-Seng Chua
dc.date.accessioned	2020-04-28T02:07:37Z
dc.date.available	2020-04-28T02:07:37Z
dc.date.issued	2018-10-26
dc.identifier.citation	Lizi Liao, Yunshan Ma, Xiangnan He, Richang Hong, Tat-Seng Chua (2018-10-26). Knowledge-aware Multimodal Dialog Systems. ACM Multimedia Conference 2018 : 801-809. ScholarBank@NUS Repository. https://doi.org/10.1145/3240508.3240605
dc.identifier.isbn	9781450356657
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/167282
dc.description.abstract	By offering a natural way for information seeking, multimodal dialogue systems are attracting increasing attention in several domains such as retail, travel etc. However, most existing dialogue systems are limited to textual modality, which cannot be easily extended to capture the rich semantics in visual modality such as product images. For example, in fashion domain, the visual appearance of clothes and matching styles play a crucial role in understanding the user's intention. Without considering these, the dialogue agent may fail to generate desirable responses for users. In this paper, we present a Knowledge-aware Multimodal Dialogue (KMD) model to address the limitation of text-based dialogue systems. It gives special consideration to the semantics and domain knowledge revealed in visual content, and is featured with three key components. First, we build a taxonomy-based learning module to capture the fine-grained semantics in images (e.g., the category and attributes of a product). Second, we propose an end-to-end neural conversational model to generate responses based on the conversation history, visual semantics, and domain knowledge. Lastly, to avoid inconsistent dialogues, we adopt a deep reinforcement learning method which accounts for future rewards to optimize the neural conversational model. We perform extensive evaluation on a multi-turn task-oriented dialogue dataset in fashion domain. Experiment results show that our method significantly outperforms state-of-the-art methods, demonstrating the efficacy of modeling visual modality and domain knowledge for dialogue systems. © 2018 Association for Computing Machinery.
dc.publisher	Association for Computing Machinery, Inc
dc.subject	Domain Knowledge
dc.subject	Fashion
dc.subject	Multimodal Dialogue
dc.type	Conference Paper
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1145/3240508.3240605
dc.description.sourcetitle	ACM Multimedia Conference 2018
dc.description.page	801-809
dc.grant.id	R-252-300-002-490
dc.grant.fundingagency	Infocomm Media Development Authority
dc.grant.fundingagency	National Research Foundation
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
Knowledge-aware Multimodal Dialogue Systems.pdf		5.63 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM