Please use this identifier to cite or link to this item:
Title: Knowledge-aware Multimodal Dialog Systems
Authors: Lizi Liao 
Yunshan Ma 
Xiangnan He 
Richang Hong 
Tat-Seng Chua 
Keywords: Domain Knowledge
Multimodal Dialogue
Issue Date: 26-Oct-2018
Publisher: Association for Computing Machinery, Inc
Citation: Lizi Liao, Yunshan Ma, Xiangnan He, Richang Hong, Tat-Seng Chua (2018-10-26). Knowledge-aware Multimodal Dialog Systems. ACM Multimedia Conference 2018 : 801-809. ScholarBank@NUS Repository.
Abstract: By offering a natural way for information seeking, multimodal dialogue systems are attracting increasing attention in several domains such as retail, travel etc. However, most existing dialogue systems are limited to textual modality, which cannot be easily extended to capture the rich semantics in visual modality such as product images. For example, in fashion domain, the visual appearance of clothes and matching styles play a crucial role in understanding the user's intention. Without considering these, the dialogue agent may fail to generate desirable responses for users. In this paper, we present a Knowledge-aware Multimodal Dialogue (KMD) model to address the limitation of text-based dialogue systems. It gives special consideration to the semantics and domain knowledge revealed in visual content, and is featured with three key components. First, we build a taxonomy-based learning module to capture the fine-grained semantics in images (e.g., the category and attributes of a product). Second, we propose an end-to-end neural conversational model to generate responses based on the conversation history, visual semantics, and domain knowledge. Lastly, to avoid inconsistent dialogues, we adopt a deep reinforcement learning method which accounts for future rewards to optimize the neural conversational model. We perform extensive evaluation on a multi-turn task-oriented dialogue dataset in fashion domain. Experiment results show that our method significantly outperforms state-of-the-art methods, demonstrating the efficacy of modeling visual modality and domain knowledge for dialogue systems. © 2018 Association for Computing Machinery.
Source Title: ACM Multimedia Conference 2018
ISBN: 9781450356657
DOI: 10.1145/3240508.3240605
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Knowledge-aware Multimodal Dialogue Systems.pdf5.63 MBAdobe PDF




checked on Nov 24, 2021

Page view(s)

checked on Nov 18, 2021


checked on Nov 18, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.