DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/246269

Title:	DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS
Authors:	HUANG ZIYUAN
ORCID iD:	orcid.org/0000-0002-4544-0427
Keywords:	Foundation Models, Representation Learning, Video Understanding, Transformers, Contrastive Learning, Object Tracking
Issue Date:	3-Aug-2023
Citation:	HUANG ZIYUAN (2023-08-03). DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS. ScholarBank@NUS Repository.
Abstract:	In this thesis, we focus on video foundation models. Specifically, we would like to investigate approaches for learning deep representations from videos, which is one of the most important topics with respect to the video foundation models. Three challenges are identified that potentially impede the advancement of foundation models in the paradigm of video understanding. (i) Current model structures for processing videos are inefficient in extracting features from videos. (ii) The frameworks for learning video representations from unannotated data are mostly inherited from images, which fail to leverage the motions between frames, and are suboptimal for learning representations from untrimmed videos. (iii) The adaptation of pre-trained video models is limited to spatiotemporal understanding tasks, while many spatial understanding tasks could potentially benefit from incorporating the temporal context between consecutive frames.
URI:	https://scholarbank.nus.edu.sg/handle/10635/246269
Appears in Collections:	Ph.D Theses (Open)

File	Description	Size	Format	Access Settings	Version
Thesis-final-signed.pdf		15.81 MB	Adobe PDF	OPEN	None	View/Download

Check