Please use this identifier to cite or link to this item:
https://doi.org/10.1016/j.neucom.2020.12.029
DC Field | Value | |
---|---|---|
dc.title | 3-D Relation Network for visual relation recognition in videos | |
dc.contributor.author | Qianwen Cao | |
dc.contributor.author | Heyan Huang | |
dc.contributor.author | Xindi Shang | |
dc.contributor.author | Boran Wang | |
dc.contributor.author | Tat-Seng Chua | |
dc.date.accessioned | 2021-05-07T02:42:02Z | |
dc.date.available | 2021-05-07T02:42:02Z | |
dc.date.issued | 2020-12-10 | |
dc.identifier.citation | Qianwen Cao, Heyan Huang, Xindi Shang, Boran Wang, Tat-Seng Chua (2020-12-10). 3-D Relation Network for visual relation recognition in videos. Neurocomputing 432 : 91-100. ScholarBank@NUS Repository. https://doi.org/10.1016/j.neucom.2020.12.029 | |
dc.identifier.issn | 09252312 | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/190980 | |
dc.description.abstract | Video visual relation recognition aims at mining the dynamic relation instances between objects in the form of 〈subject,predicate,object〉, such as “person1-towards-person2” and “person-ride-bicycle”. Existing solutions treat the problem as several independent sub-tasks, i.e., image object detection, video object tracking and trajectory-based relation prediction. We argue that such separation results in the lack of information flow between different sub-models, which creates redundant representation while each sub-task cannot share a common set of task-specific features. Toward this end, we connect these three sub-tasks in an end-to-end manner by proposing the 3-D relation proposal that serves as a bridge for relation feature learning. Specifically, we put forward a novel deep neural network, named 3DRN, to fuse the spatio-temporal visual characteristics, object label features, and spatial interactive features for learning the relation instances with multi-modal cues. In addition, a three-staged training strategy is also provided to facilitate large-scale parameter optimization. We conduct extensive experiments on two public datasets with different emphasis to demonstrate the effectiveness of the proposed end-to-end feature learning method for visual relation recognition in videos. Furthermore, we verify the potential of our approach by tackling the video relation detection task. © 2020 Elsevier B.V. | |
dc.publisher | Elsevier B.V. | |
dc.subject | Computer vision | |
dc.subject | Deep neural network | |
dc.subject | Video visual relation recognition | |
dc.subject | Visual relation detection | |
dc.type | Article | |
dc.contributor.department | INSTITUTE OF SYSTEMS SCIENCE | |
dc.description.doi | 10.1016/j.neucom.2020.12.029 | |
dc.description.sourcetitle | Neurocomputing | |
dc.description.volume | 432 | |
dc.description.page | 91-100 | |
dc.published.state | Published | |
Appears in Collections: | Staff Publications Elements |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
3-D Relation Network for visual relation recognition in videos.pdf | 2.36 MB | Adobe PDF | CLOSED | None |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.