Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/247289
DC FieldValue
dc.titleCROSS-MODALITY COMPLEMENTARITY FOR AUDIO-VISUAL SPEECH RECOGNITION
dc.contributor.authorWANG JIADONG
dc.date.accessioned2024-02-29T18:00:43Z
dc.date.available2024-02-29T18:00:43Z
dc.date.issued2023-07-04
dc.identifier.citationWANG JIADONG (2023-07-04). CROSS-MODALITY COMPLEMENTARITY FOR AUDIO-VISUAL SPEECH RECOGNITION. ScholarBank@NUS Repository.
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/247289
dc.description.abstractSpeech recognition is an indispensable tool for human-robot interaction. Inspired by human processes, the integration of audio and visual modalities enhances robustness in transcribing texts. However, corruption may occur in either modality or both, leading to a degradation in speech recognition. Therefore, utilizing audio-visual complementarity to mitigate corruption is essential, given the unique properties of these two modalities. To achieve this goal, I address three types of corruption through audio-visual complementarity. Mimicking human speech perception, the first part employs the visual modality to complement speeches corrupted by acoustic noise. The second part tackles the issue of speakers missing in the camera's field of view through a novel sound source localization. Finally, the third part aims to reconstruct occluded lips with the assistance of the audio modality.
dc.language.isoen
dc.subjectMulti-modality, speech recognition, modality corruption, audio-visual fusion, lip generation
dc.typeThesis
dc.contributor.departmentELECTRICAL & COMPUTER ENGINEERING
dc.contributor.supervisorRobby Tantowi Tan
dc.contributor.supervisorHaizhou Li
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY (CDE-ENG)
dc.identifier.orcid0000-0001-9372-3133
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Jiadong_Thesis.pdf10.92 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.