An analysis of speaker dependent models in replay detection | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1017/ATSIP.2020.9

Title:	An analysis of speaker dependent models in replay detection
Authors:	Suthokumar, G. Sriskandaraja, K. Sethu, V. Ambikairajah, E. Li, H.
Keywords:	Replay Attack Speaker Adapted Neural Networks Speaker Dependent Models Speaker Verification Spoofing Detection
Issue Date:	2020
Publisher:	Cambridge University Press
Citation:	Suthokumar, G., Sriskandaraja, K., Sethu, V., Ambikairajah, E., Li, H. (2020). An analysis of speaker dependent models in replay detection. APSIPA Transactions on Signal and Information Processing 9 : e14. ScholarBank@NUS Repository. https://doi.org/10.1017/ATSIP.2020.9
Rights:	Attribution 4.0 International
Abstract:	Most research on replay detection has focused on developing a stand-alone countermeasure that runs independently of a speaker verification system by training a single spoofed model and a single genuine model for all speakers. In this paper, we explore the potential benefits of adapting the back-end of a spoofing detection system towards the claimed target speaker. Specifically, we characterize and quantify speaker variability by comparing speaker-dependent and speaker-independent (SI) models of feature distributions for both genuine and spoofed speech. Following this, we develop an approach for implementing speaker-dependent spoofing detection using a Gaussian mixture model (GMM) back-end, where both the genuine and spoofed models are adapted to the claimed speaker. Finally, we also develop and evaluate a speaker-specific neural network-based spoofing detection system in addition to the GMM based back-end. Evaluations of the proposed approaches on replay corpora BTAS2016 and ASVspoof2017 v2.0 reveal that the proposed speaker-dependent spoofing detection outperforms equivalent SI replay detection baselines on both datasets. Our experimental results show that the use of speaker-specific genuine models leads to a significant improvement (around 4% in terms of equal error rate (EER)) as previously shown and the addition of speaker-specific spoofed models adds a small improvement on top (less than 1% in terms of EER). © 2020 The Author(s). Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association.
Source Title:	APSIPA Transactions on Signal and Information Processing
URI:	https://scholarbank.nus.edu.sg/handle/10635/196802
ISSN:	20487703
DOI:	10.1017/ATSIP.2020.9
Rights:	Attribution 4.0 International
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
10_1017_ATSIP_2020_9.pdf		2.86 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Altmetric

This item is licensed under a Creative Commons License