Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/245525
Title: | AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION | Authors: | TAO RUIJIE | ORCID iD: | orcid.org/0000-0003-0021-5661 | Keywords: | Audio-visual, speaker recognition, active speaker detection, self-supervised learning, cross-modality, noisy label | Issue Date: | 24-Mar-2023 | Citation: | TAO RUIJIE (2023-03-24). AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION. ScholarBank@NUS Repository. | Abstract: | Audio-visual speech processing aims to solve the speech-related problem with audio and visual information. Research in biology has proved that humans can perceive the world from multi-modalities since speech and face modalities can provide complementary information. In this thesis, we focus on audio-visual speaker signal processing and make the following contributions: 1) We apply the long-term temporal information and handle videos in the wild for detecting the talking person. 2) We filter the noisy speaker recognition data in the large-scale audio-visual dataset and achieve cross-modal speaker recognition. 3) We remove unreliable speech data during self-supervised speaker recognition automatically. Then we search and utilise the diverse positive pairs for audio-visual self-supervised speaker recognition. According to our research, speaker information and characteristics can be significant cues to assist multi-modal signal processing and self-supervised learning. | URI: | https://scholarbank.nus.edu.sg/handle/10635/245525 |
Appears in Collections: | Ph.D Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
TaoRJ.pdf | 9.95 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.