Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/245525
Title: AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION
Authors: TAO RUIJIE
ORCID iD:   orcid.org/0000-0003-0021-5661
Keywords: Audio-visual, speaker recognition, active speaker detection, self-supervised learning, cross-modality, noisy label
Issue Date: 24-Mar-2023
Citation: TAO RUIJIE (2023-03-24). AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION. ScholarBank@NUS Repository.
Abstract: Audio-visual speech processing aims to solve the speech-related problem with audio and visual information. Research in biology has proved that humans can perceive the world from multi-modalities since speech and face modalities can provide complementary information. In this thesis, we focus on audio-visual speaker signal processing and make the following contributions: 1) We apply the long-term temporal information and handle videos in the wild for detecting the talking person. 2) We filter the noisy speaker recognition data in the large-scale audio-visual dataset and achieve cross-modal speaker recognition. 3) We remove unreliable speech data during self-supervised speaker recognition automatically. Then we search and utilise the diverse positive pairs for audio-visual self-supervised speaker recognition. According to our research, speaker information and characteristics can be significant cues to assist multi-modal signal processing and self-supervised learning.
URI: https://scholarbank.nus.edu.sg/handle/10635/245525
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
TaoRJ.pdf9.95 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.