Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/246269
Title: DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS
Authors: HUANG ZIYUAN
ORCID iD:   orcid.org/0000-0002-4544-0427
Keywords: Foundation Models, Representation Learning, Video Understanding, Transformers, Contrastive Learning, Object Tracking
Issue Date: 3-Aug-2023
Citation: HUANG ZIYUAN (2023-08-03). DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS. ScholarBank@NUS Repository.
Abstract: In this thesis, we focus on video foundation models. Specifically, we would like to investigate approaches for learning deep representations from videos, which is one of the most important topics with respect to the video foundation models. Three challenges are identified that potentially impede the advancement of foundation models in the paradigm of video understanding. (i) Current model structures for processing videos are inefficient in extracting features from videos. (ii) The frameworks for learning video representations from unannotated data are mostly inherited from images, which fail to leverage the motions between frames, and are suboptimal for learning representations from untrimmed videos. (iii) The adaptation of pre-trained video models is limited to spatiotemporal understanding tasks, while many spatial understanding tasks could potentially benefit from incorporating the temporal context between consecutive frames.
URI: https://scholarbank.nus.edu.sg/handle/10635/246269
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Thesis-final-signed.pdf15.81 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.