Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/249508
Title: VISUAL CAUSAL INFERENCE
Authors: YICONG LI
ORCID iD:   orcid.org/0000-0002-5659-793X
Keywords: Video-Language Model, Multimodal Understanding
Issue Date: 2-Jan-2024
Citation: YICONG LI (2024-01-02). VISUAL CAUSAL INFERENCE. ScholarBank@NUS Repository.
Abstract: After a decade of prosperity, the development of video understanding has reached a critical juncture, where the sole reliance on massive data and complex architectures is no longer a one-size-fits-all solution to all situations. The presence of ubiquitous data imbalance hampers DNNs from effectively learning the underlying causal mechanisms, leading to significant performance drops when encountering distribution shifts, such as long-tail imbalances and perturbed imbalances. This realization has prompted researchers to seek alternative methodologies to capture causal patterns in video data. To tackle these challenges and increase the robustness of DNNs, causal modeling emerged as a principle to discover the true causal patterns behind the observed correlations. This thesis focuses on the domain of semantic video understanding and explores the potential of causal modeling to advance some fundamental video understanding tasks, such as Video Relation Detection and Video Question Answering.
URI: https://scholarbank.nus.edu.sg/handle/10635/249508
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Thesis_final_submission.pdf36.44 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.