VISUAL RELATION DRIVEN VIDEO QUESTION ANSWERING
XIAO JUNBIN
XIAO JUNBIN
Citations
Altmetric:
Alternative Title
Abstract
Video question answering (VideoQA) is a field that has received significant attention but is still underexplored. Although progress has been made in answering questions about video content recognition, questions related to visual relation reasoning have not been adequately addressed. In light of the research gap, this thesis thus studies visual relation driven VideoQA. we first contributed a manually annotated VideoQA dataset that is rich in real-world visual relations and object interactions. We then developed three effective video graph models that focus on capturing visual relations from various aspects for answering relation-aware questions, such as spatial, temporal (both local and global), hierarchical, and compositional scopes. Our proposed methods cover learning from limited data with strong supervision by incorporating structure priors, as well as learning from vast amounts of Web data via self-supervised pretraining. With these efforts, we hope this thesis can provide a foundation for future studies in relation-aware videoQA.
Keywords
Video Visual Relation, Video Question Answering, Video Graph Representation
Source Title
Publisher
Series/Report No.
Collections
Rights
Date
2022-11-11
DOI
Type
Thesis