3-D Relation Network for visual relation recognition in videos

Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.neucom.2020.12.029

DC Field	Value
dc.title	3-D Relation Network for visual relation recognition in videos
dc.contributor.author	Qianwen Cao
dc.contributor.author	Heyan Huang
dc.contributor.author	Xindi Shang
dc.contributor.author	Boran Wang
dc.contributor.author	Tat-Seng Chua
dc.date.accessioned	2021-05-07T02:42:02Z
dc.date.available	2021-05-07T02:42:02Z
dc.date.issued	2020-12-10
dc.identifier.citation	Qianwen Cao, Heyan Huang, Xindi Shang, Boran Wang, Tat-Seng Chua (2020-12-10). 3-D Relation Network for visual relation recognition in videos. Neurocomputing 432 : 91-100. ScholarBank@NUS Repository. https://doi.org/10.1016/j.neucom.2020.12.029
dc.identifier.issn	09252312
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/190980
dc.description.abstract	Video visual relation recognition aims at mining the dynamic relation instances between objects in the form of 〈subject,predicate,object〉, such as “person1-towards-person2” and “person-ride-bicycle”. Existing solutions treat the problem as several independent sub-tasks, i.e., image object detection, video object tracking and trajectory-based relation prediction. We argue that such separation results in the lack of information flow between different sub-models, which creates redundant representation while each sub-task cannot share a common set of task-specific features. Toward this end, we connect these three sub-tasks in an end-to-end manner by proposing the 3-D relation proposal that serves as a bridge for relation feature learning. Specifically, we put forward a novel deep neural network, named 3DRN, to fuse the spatio-temporal visual characteristics, object label features, and spatial interactive features for learning the relation instances with multi-modal cues. In addition, a three-staged training strategy is also provided to facilitate large-scale parameter optimization. We conduct extensive experiments on two public datasets with different emphasis to demonstrate the effectiveness of the proposed end-to-end feature learning method for visual relation recognition in videos. Furthermore, we verify the potential of our approach by tackling the video relation detection task. © 2020 Elsevier B.V.
dc.publisher	Elsevier B.V.
dc.subject	Computer vision
dc.subject	Deep neural network
dc.subject	Video visual relation recognition
dc.subject	Visual relation detection
dc.type	Article
dc.contributor.department	INSTITUTE OF SYSTEMS SCIENCE
dc.description.doi	10.1016/j.neucom.2020.12.029
dc.description.sourcetitle	Neurocomputing
dc.description.volume	432
dc.description.page	91-100
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
3-D Relation Network for visual relation recognition in videos.pdf		2.36 MB	Adobe PDF	CLOSED	None

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM