Context dependent acoustic keyword spotting using deep neural network | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/APSIPA.2013.6694175

DC Field	Value
dc.title	Context dependent acoustic keyword spotting using deep neural network
dc.contributor.author	Wang, G.
dc.contributor.author	Sim, K.C.
dc.date.accessioned	2014-07-04T03:12:03Z
dc.date.available	2014-07-04T03:12:03Z
dc.date.issued	2013
dc.identifier.citation	Wang, G.,Sim, K.C. (2013). Context dependent acoustic keyword spotting using deep neural network. 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 : -. ScholarBank@NUS Repository. <a href="https://doi.org/10.1109/APSIPA.2013.6694175" target="_blank">https://doi.org/10.1109/APSIPA.2013.6694175</a>
dc.identifier.isbn	9789869000604
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/78071
dc.description.abstract	Language model is an essential component of a speech recogniser. It provides the additional linguistic information to constrain the search space and guide the decoding. In this paper, language model is incorporated in the keyword spotting system to provide the contexts for the keyword models under the weighted finite state transducer framework. A context independent deep neural network is trained as the acoustic model. Three keyword contexts are investigated: the phone to keyword context, fixed length word context and the arbitrary length word context. To provide these contexts, a hybrid language model with both word and phone tokens is trained using only the word n-gram count. Three different spotting graphs are studied depending on the involved contexts: the keyword loop graph, the word fillers graph and the word loop fillers graph. These graphs are referred to as the context dependent (CD) keyword spotting graphs. The CD keyword spotting systems are evaluated on the Broadcasting News Hub4-97 F0 evaluation set. Experimental results reveal that the incorporation of the language model information provides performance gain over the baseline context independent graph without any contexts for all the three CD graphs. The best system using the arbitrary length word context has the comparable performance to the full decoding but triples the spotting speed. In addition, error analysis demonstrates that the language model information is essential to reduce both the insertion and deletion errors. © 2013 APSIPA.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/APSIPA.2013.6694175
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1109/APSIPA.2013.6694175
dc.description.sourcetitle	2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
dc.description.page	-
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.