Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/246269
Title: | DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS | Authors: | HUANG ZIYUAN | ORCID iD: | orcid.org/0000-0002-4544-0427 | Keywords: | Foundation Models, Representation Learning, Video Understanding, Transformers, Contrastive Learning, Object Tracking | Issue Date: | 3-Aug-2023 | Citation: | HUANG ZIYUAN (2023-08-03). DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS. ScholarBank@NUS Repository. | Abstract: | In this thesis, we focus on video foundation models. Specifically, we would like to investigate approaches for learning deep representations from videos, which is one of the most important topics with respect to the video foundation models. Three challenges are identified that potentially impede the advancement of foundation models in the paradigm of video understanding. (i) Current model structures for processing videos are inefficient in extracting features from videos. (ii) The frameworks for learning video representations from unannotated data are mostly inherited from images, which fail to leverage the motions between frames, and are suboptimal for learning representations from untrimmed videos. (iii) The adaptation of pre-trained video models is limited to spatiotemporal understanding tasks, while many spatial understanding tasks could potentially benefit from incorporating the temporal context between consecutive frames. | URI: | https://scholarbank.nus.edu.sg/handle/10635/246269 |
Appears in Collections: | Ph.D Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
Thesis-final-signed.pdf | 15.81 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.