Please use this identifier to cite or link to this item:
https://doi.org/10.1017/ATSIP.2020.9
Title: | An analysis of speaker dependent models in replay detection | Authors: | Suthokumar, G. Sriskandaraja, K. Sethu, V. Ambikairajah, E. Li, H. |
Keywords: | Replay Attack Speaker Adapted Neural Networks Speaker Dependent Models Speaker Verification Spoofing Detection |
Issue Date: | 2020 | Publisher: | Cambridge University Press | Citation: | Suthokumar, G., Sriskandaraja, K., Sethu, V., Ambikairajah, E., Li, H. (2020). An analysis of speaker dependent models in replay detection. APSIPA Transactions on Signal and Information Processing 9 : e14. ScholarBank@NUS Repository. https://doi.org/10.1017/ATSIP.2020.9 | Rights: | Attribution 4.0 International | Abstract: | Most research on replay detection has focused on developing a stand-alone countermeasure that runs independently of a speaker verification system by training a single spoofed model and a single genuine model for all speakers. In this paper, we explore the potential benefits of adapting the back-end of a spoofing detection system towards the claimed target speaker. Specifically, we characterize and quantify speaker variability by comparing speaker-dependent and speaker-independent (SI) models of feature distributions for both genuine and spoofed speech. Following this, we develop an approach for implementing speaker-dependent spoofing detection using a Gaussian mixture model (GMM) back-end, where both the genuine and spoofed models are adapted to the claimed speaker. Finally, we also develop and evaluate a speaker-specific neural network-based spoofing detection system in addition to the GMM based back-end. Evaluations of the proposed approaches on replay corpora BTAS2016 and ASVspoof2017 v2.0 reveal that the proposed speaker-dependent spoofing detection outperforms equivalent SI replay detection baselines on both datasets. Our experimental results show that the use of speaker-specific genuine models leads to a significant improvement (around 4% in terms of equal error rate (EER)) as previously shown and the addition of speaker-specific spoofed models adds a small improvement on top (less than 1% in terms of EER). © 2020 The Author(s). Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association. | Source Title: | APSIPA Transactions on Signal and Information Processing | URI: | https://scholarbank.nus.edu.sg/handle/10635/196802 | ISSN: | 20487703 | DOI: | 10.1017/ATSIP.2020.9 | Rights: | Attribution 4.0 International |
Appears in Collections: | Staff Publications Elements |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
10_1017_ATSIP_2020_9.pdf | 2.86 MB | Adobe PDF | OPEN | None | View/Download |
This item is licensed under a Creative Commons License