Please use this identifier to cite or link to this item: https://doi.org/10.21437/Interspeech.2017-303
Title: Multi-stage DNN training for automatic recognition of dysarthric speech
Authors: Yilmaz E. 
Mario Ganzeboom
Catia Cucchiarini
Helmer Strik
Keywords: Automatic speech recognition, Deep neural networks, Dysarthria, Pathological speech
Issue Date: 1-Aug-2017
Publisher: International Speech Communication Association
Citation: Yilmaz E., Mario Ganzeboom, Catia Cucchiarini, Helmer Strik (2017-08-01). Multi-stage DNN training for automatic recognition of dysarthric speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017-August : 2685-2689. ScholarBank@NUS Repository. https://doi.org/10.21437/Interspeech.2017-303
Abstract: Incorporating automatic speech recognition (ASR) in individualized speech training applications is becoming more viable thanks to the improved generalization capabilities of neural network-based acoustic models. The main problem in developing applications for dysarthric speech is the relative in-domain data scarcity. Collecting representative amounts of dysarthric speech data is difficult due to rigorous ethical and medical permission requirements, problems in accessing patients who are generally vulnerable and often subject to altering health conditions and, last but not least, the high variability in speech resulting from different pathological conditions. Developing such applications is even more challenging for languages which in general have fewer resources, fewer speakers and, consequently, also fewer patients than English, as in the case of a mid-sized language like Dutch. In this paper, we investigate a multi-stage deep neural network (DNN) training scheme aimed at obtaining better modeling of dysarthric speech by using only a small amount of in-domain training data. The results show that the system employing the proposed training scheme considerably improves the recognition of Dutch dysarthric speech compared to a baseline system with single-stage training only on a large amount of normal speech or a small amount of in-domain data.
Source Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
URI: http://scholarbank.nus.edu.sg/handle/10635/145522
ISSN: 2308457X
DOI: 10.21437/Interspeech.2017-303
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Interspeech2017_2.pdfPreprint version96.46 kBAdobe PDF

OPEN

Pre-printView/Download

Page view(s)

64
checked on Sep 20, 2018

Download(s)

4
checked on Sep 20, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.