Please use this identifier to cite or link to this item:
Title: Visual relationship detection with visual-linguistic knowledge from multimodal representations
Authors: Chiou, Meng-Jiun
Zimmermann, Roger 
Feng, Jiashi 
Keywords: Computer vision
image analysis
multimodal representation
scene graph generation
visual relationship detection
Issue Date: 1-Jan-2021
Publisher: Institute of Electrical and Electronics Engineers Inc.
Citation: Chiou, Meng-Jiun, Zimmermann, Roger, Feng, Jiashi (2021-01-01). Visual relationship detection with visual-linguistic knowledge from multimodal representations. IEEE Access 9 : 50441-50451. ScholarBank@NUS Repository.
Rights: Attribution 4.0 International
Abstract: Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanisms, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual relationship recognition by taking in object names directly, enabling it to be used on top of any object detection system. We show through quantitative and qualitative experiments that, with the transferred knowledge and novel modules, RVL-BERT achieves competitive results on two challenging visual relationship detection datasets. The source code is available at © 2013 IEEE.
Source Title: IEEE Access
ISSN: 2169-3536
DOI: 10.1109/access.2021.3069041
Rights: Attribution 4.0 International
Appears in Collections:Elements
Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1109_access_2021_3069041.pdf2.14 MBAdobe PDF



Google ScholarTM



This item is licensed under a Creative Commons License Creative Commons