VISUAL CAUSAL INFERENCE | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/249508

Title:	VISUAL CAUSAL INFERENCE
Authors:	YICONG LI
ORCID iD:	orcid.org/0000-0002-5659-793X
Keywords:	Video-Language Model, Multimodal Understanding
Issue Date:	2-Jan-2024
Citation:	YICONG LI (2024-01-02). VISUAL CAUSAL INFERENCE. ScholarBank@NUS Repository.
Abstract:	After a decade of prosperity, the development of video understanding has reached a critical juncture, where the sole reliance on massive data and complex architectures is no longer a one-size-fits-all solution to all situations. The presence of ubiquitous data imbalance hampers DNNs from effectively learning the underlying causal mechanisms, leading to significant performance drops when encountering distribution shifts, such as long-tail imbalances and perturbed imbalances. This realization has prompted researchers to seek alternative methodologies to capture causal patterns in video data. To tackle these challenges and increase the robustness of DNNs, causal modeling emerged as a principle to discover the true causal patterns behind the observed correlations. This thesis focuses on the domain of semantic video understanding and explores the potential of causal modeling to advance some fundamental video understanding tasks, such as Video Relation Detection and Video Question Answering.
URI:	https://scholarbank.nus.edu.sg/handle/10635/249508
Appears in Collections:	Ph.D Theses (Open)

File	Description	Size	Format	Access Settings	Version
Thesis_final_submission.pdf		36.44 MB	Adobe PDF	OPEN	None	View/Download

Check