Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/248157
Title: TRANSFORMER TECHNIQUES FOR HUMAN ACTION RECOGNITION AND LOCALIZATION
Authors: CHANG SHUNING
ORCID iD:   orcid.org/0000-0001-5752-0128
Keywords: Transformer,Deep Learning,Action Recognition, Action Localization,Video Understanding
Issue Date: 15-Aug-2023
Citation: CHANG SHUNING (2023-08-15). TRANSFORMER TECHNIQUES FOR HUMAN ACTION RECOGNITION AND LOCALIZATION. ScholarBank@NUS Repository.
Abstract: Transformer has gained considerable attention for its ability to capture long-range dependencies through the use of attention. Its success in language modeling has motivated researchers to explore its potential for computer vision applications, where it has demonstrated promising results in certain tasks such as image classification and joint vision-language modeling. Notably, the Transformer holds great potential for video tasks due to its ability to model all-to-all relationships, which can aid in capturing motion cues, long-range temporal interactions, and dynamic appearance changes in video data. This thesis focuses on the application of the Transformer human action recognition and localization in video. The research begins by developing an efficient vision transformer backbone for the fundamental task of action understanding, action classification. This thesis then progresses to enhancing temporal action localization besides action recognition. Finally, the research culminates in the adoption of the Transformer to achieve efficient one-stage video spatio-temporal action localization.
URI: https://scholarbank.nus.edu.sg/handle/10635/248157
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ChangSN.pdf16.47 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.