Please use this identifier to cite or link to this item:
Authors: ZHOU YI
Keywords: cross-lingual voice conversion, speech synthesis, PPG, multi-task learning
Issue Date: 8-Oct-2021
Citation: ZHOU YI (2021-10-08). CROSS-LINGUAL VOICE CONVERSION. ScholarBank@NUS Repository.
Abstract: The thesis focuses on the cross-lingual voice conversion (XVC) that aims to modify a source speaker identity towards a target while preserving the source linguistic content. Firstly, we study the use of bilingual information in XVC including the bilingual linguistic representation and average modeling approach, where the model is trained with multi-speaker database in two languages. Secondly, we propose Multi-Task WaveRNN with an integrated architecture for XVC. It maps the input linguistic information directly to the speech waveform, and the network is developed in two steps with multi-task learning. Thirdly, the language agnostic speaker embedding is proposed using an encoder-decoder architecture that disentangles the language information from speaker embeddings via multi-task learning. The proposed speaker embedding is verified in both XVC and text-to-speech synthesis tasks. Lastly, we investigate the problem of non-native accent in current XVC techniques and propose to incorporate an additional linguistic consistency loss into the XVC network to make the converted speech sound native.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ZHOUY.pdf12.29 MBAdobe PDF



Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.