Exploiting untranscribed broadcast data for improved code-switching detection

Please use this identifier to cite or link to this item: https://doi.org/10.21437/Interspeech.2017-391

DC Field	Value
dc.title	Exploiting untranscribed broadcast data for improved code-switching detection
dc.contributor.author	Yilmaz E.
dc.contributor.author	Henk van den Heuvel
dc.contributor.author	David van Leeuwen
dc.date.accessioned	2018-08-02T04:58:55Z
dc.date.available	2018-08-02T04:58:55Z
dc.date.issued	2017-01-01
dc.identifier.citation	Yilmaz E., Henk van den Heuvel, David van Leeuwen (2017-01-01). Exploiting untranscribed broadcast data for improved code-switching detection. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017-August : 42-46. ScholarBank@NUS Repository. https://doi.org/10.21437/Interspeech.2017-391
dc.identifier.issn	2308457X
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/145521
dc.description.abstract	We have recently presented an automatic speech recognition (ASR) system operating on Frisian-Dutch code-switched speech. This type of speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we extend this work by using some raw broadcast data to improve multilingually trained deep neural networks (DNN) that have been trained on 11.5 hours of manually annotated bilingual speech. For this purpose, we apply the initial ASR to the untranscribed broadcast data and automatically create transcriptions based on the recognizer output using different language models for rescoring. Then, we train new acoustic models on the combined data, i.e., the manually and automatically transcribed bilingual broadcast data, and investigate the automatic transcription quality based on the recognition accuracies on a separate set of development and test data. Finally, we report code-switching detection performance elaborating on the correlation between the ASR and the code-switching detection performance.
dc.language.iso	en
dc.publisher	ISCA
dc.subject	Bilingual ASR, Code-switching, Frisian language, Under-resourced languages
dc.type	Conference Paper
dc.contributor.department	ELECTRICAL & COMPUTER ENGINEERING
dc.description.doi	10.21437/Interspeech.2017-391
dc.description.sourcetitle	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.description.volume	2017-August
dc.description.page	42-46
dc.published.state	Published
dc.grant.id	NWO Project 314-99-119 (Frisian Audio Mining Enterprise)
dc.grant.fundingagency	Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Appears in Collections:	Elements Staff Publications

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
Interspeech2017_1.pdf		283.18 kB	Adobe PDF	OPEN	Post-print	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM