Please use this identifier to cite or link to this item: https://doi.org/10.1017/ATSIP.2020.9
DC FieldValue
dc.titleAn analysis of speaker dependent models in replay detection
dc.contributor.authorSuthokumar, G.
dc.contributor.authorSriskandaraja, K.
dc.contributor.authorSethu, V.
dc.contributor.authorAmbikairajah, E.
dc.contributor.authorLi, H.
dc.date.accessioned2021-08-13T02:54:11Z
dc.date.available2021-08-13T02:54:11Z
dc.date.issued2020
dc.identifier.citationSuthokumar, G., Sriskandaraja, K., Sethu, V., Ambikairajah, E., Li, H. (2020). An analysis of speaker dependent models in replay detection. APSIPA Transactions on Signal and Information Processing 9 : e14. ScholarBank@NUS Repository. https://doi.org/10.1017/ATSIP.2020.9
dc.identifier.issn20487703
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/196802
dc.description.abstractMost research on replay detection has focused on developing a stand-alone countermeasure that runs independently of a speaker verification system by training a single spoofed model and a single genuine model for all speakers. In this paper, we explore the potential benefits of adapting the back-end of a spoofing detection system towards the claimed target speaker. Specifically, we characterize and quantify speaker variability by comparing speaker-dependent and speaker-independent (SI) models of feature distributions for both genuine and spoofed speech. Following this, we develop an approach for implementing speaker-dependent spoofing detection using a Gaussian mixture model (GMM) back-end, where both the genuine and spoofed models are adapted to the claimed speaker. Finally, we also develop and evaluate a speaker-specific neural network-based spoofing detection system in addition to the GMM based back-end. Evaluations of the proposed approaches on replay corpora BTAS2016 and ASVspoof2017 v2.0 reveal that the proposed speaker-dependent spoofing detection outperforms equivalent SI replay detection baselines on both datasets. Our experimental results show that the use of speaker-specific genuine models leads to a significant improvement (around 4% in terms of equal error rate (EER)) as previously shown and the addition of speaker-specific spoofed models adds a small improvement on top (less than 1% in terms of EER). © 2020 The Author(s). Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association.
dc.publisherCambridge University Press
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourceScopus OA2020
dc.subjectReplay Attack
dc.subjectSpeaker Adapted Neural Networks
dc.subjectSpeaker Dependent Models
dc.subjectSpeaker Verification
dc.subjectSpoofing Detection
dc.typeArticle
dc.contributor.departmentELECTRICAL AND COMPUTER ENGINEERING
dc.description.doi10.1017/ATSIP.2020.9
dc.description.sourcetitleAPSIPA Transactions on Signal and Information Processing
dc.description.volume9
dc.description.pagee14
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1017_ATSIP_2020_9.pdf2.86 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons