Please use this identifier to cite or link to this item:
Title: 3-D Relation Network for visual relation recognition in videos
Authors: Qianwen Cao
Heyan Huang
Xindi Shang 
Boran Wang
Tat-Seng Chua 
Keywords: Computer vision
Deep neural network
Video visual relation recognition
Visual relation detection
Issue Date: 10-Dec-2020
Publisher: Elsevier B.V.
Citation: Qianwen Cao, Heyan Huang, Xindi Shang, Boran Wang, Tat-Seng Chua (2020-12-10). 3-D Relation Network for visual relation recognition in videos. Neurocomputing 432 : 91-100. ScholarBank@NUS Repository.
Abstract: Video visual relation recognition aims at mining the dynamic relation instances between objects in the form of 〈subject,predicate,object〉, such as “person1-towards-person2” and “person-ride-bicycle”. Existing solutions treat the problem as several independent sub-tasks, i.e., image object detection, video object tracking and trajectory-based relation prediction. We argue that such separation results in the lack of information flow between different sub-models, which creates redundant representation while each sub-task cannot share a common set of task-specific features. Toward this end, we connect these three sub-tasks in an end-to-end manner by proposing the 3-D relation proposal that serves as a bridge for relation feature learning. Specifically, we put forward a novel deep neural network, named 3DRN, to fuse the spatio-temporal visual characteristics, object label features, and spatial interactive features for learning the relation instances with multi-modal cues. In addition, a three-staged training strategy is also provided to facilitate large-scale parameter optimization. We conduct extensive experiments on two public datasets with different emphasis to demonstrate the effectiveness of the proposed end-to-end feature learning method for visual relation recognition in videos. Furthermore, we verify the potential of our approach by tackling the video relation detection task. © 2020 Elsevier B.V.
Source Title: Neurocomputing
ISSN: 09252312
DOI: 10.1016/j.neucom.2020.12.029
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
3-D Relation Network for visual relation recognition in videos.pdf2.36 MBAdobe PDF



Page view(s)

checked on Jun 17, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.