Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/249508
Title: | VISUAL CAUSAL INFERENCE | Authors: | YICONG LI | ORCID iD: | orcid.org/0000-0002-5659-793X | Keywords: | Video-Language Model, Multimodal Understanding | Issue Date: | 2-Jan-2024 | Citation: | YICONG LI (2024-01-02). VISUAL CAUSAL INFERENCE. ScholarBank@NUS Repository. | Abstract: | After a decade of prosperity, the development of video understanding has reached a critical juncture, where the sole reliance on massive data and complex architectures is no longer a one-size-fits-all solution to all situations. The presence of ubiquitous data imbalance hampers DNNs from effectively learning the underlying causal mechanisms, leading to significant performance drops when encountering distribution shifts, such as long-tail imbalances and perturbed imbalances. This realization has prompted researchers to seek alternative methodologies to capture causal patterns in video data. To tackle these challenges and increase the robustness of DNNs, causal modeling emerged as a principle to discover the true causal patterns behind the observed correlations. This thesis focuses on the domain of semantic video understanding and explores the potential of causal modeling to advance some fundamental video understanding tasks, such as Video Relation Detection and Video Question Answering. | URI: | https://scholarbank.nus.edu.sg/handle/10635/249508 |
Appears in Collections: | Ph.D Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
Thesis_final_submission.pdf | 36.44 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.