Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/247650
Title: SELF-SUPERVISED MODELING FOR MULTI-MODAL UNDERSTANDING
Authors: YUE XIANGHU
ORCID iD:   orcid.org/0000-0003-3527-6034
Keywords: Self-supervised learning; multimodal; unsupervised learning; pre-training
Issue Date: 29-Sep-2023
Citation: YUE XIANGHU (2023-09-29). SELF-SUPERVISED MODELING FOR MULTI-MODAL UNDERSTANDING. ScholarBank@NUS Repository.
Abstract: We humans perceive information from our surrounding environment through multiple mediums and further understand or interact with the world. These multimodal clues offer different but complementary information. Currently, self-supervised learning has emerged as a promising approach to learn meaningful representations from many modalities separately, including text, speech, and vision. In this thesis, we aim to leverage self-supervised pre-training techniques for multimodal processing. We finished several works to achieve our target step-by-step. Starting from the traditional unimodal understanding task, e.g., speech recognition, the first work focuses to remedy the code-switching problem. Learning purely from labeled examples does not resemble language acquisition in humans, so the second work focuses on learning speech representations from unlabeled speech data. The third work takes the universality of self-supervised pre-training one step further, by unifying speech and text pre-training within a single model. Finally, the fourth work attempts to build a unified audio-visual-text model to enable various multimodal understanding tasks.
URI: https://scholarbank.nus.edu.sg/handle/10635/247650
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
YueXianghu.pdf3.33 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.