Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/239063
Title: VISUAL RELATION DRIVEN VIDEO QUESTION ANSWERING
Authors: XIAO JUNBIN
ORCID iD:   orcid.org/0000-0001-5573-6195
Keywords: Video Visual Relation, Video Question Answering, Video Graph Representation
Issue Date: 11-Nov-2022
Citation: XIAO JUNBIN (2022-11-11). VISUAL RELATION DRIVEN VIDEO QUESTION ANSWERING. ScholarBank@NUS Repository.
Abstract: Video question answering (VideoQA) is a field that has received significant attention but is still underexplored. Although progress has been made in answering questions about video content recognition, questions related to visual relation reasoning have not been adequately addressed. In light of the research gap, this thesis thus studies visual relation driven VideoQA. we first contributed a manually annotated VideoQA dataset that is rich in real-world visual relations and object interactions. We then developed three effective video graph models that focus on capturing visual relations from various aspects for answering relation-aware questions, such as spatial, temporal (both local and global), hierarchical, and compositional scopes. Our proposed methods cover learning from limited data with strong supervision by incorporating structure priors, as well as learning from vast amounts of Web data via self-supervised pretraining. With these efforts, we hope this thesis can provide a foundation for future studies in relation-aware videoQA.
URI: https://scholarbank.nus.edu.sg/handle/10635/239063
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
VideoQA-Junbin-PhD.pdf10.19 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.