Please use this identifier to cite or link to this item: https://doi.org/10.1109/APSIPA.2013.6694175
DC FieldValue
dc.titleContext dependent acoustic keyword spotting using deep neural network
dc.contributor.authorWang, G.
dc.contributor.authorSim, K.C.
dc.date.accessioned2014-07-04T03:12:03Z
dc.date.available2014-07-04T03:12:03Z
dc.date.issued2013
dc.identifier.citationWang, G.,Sim, K.C. (2013). Context dependent acoustic keyword spotting using deep neural network. 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 : -. ScholarBank@NUS Repository. <a href="https://doi.org/10.1109/APSIPA.2013.6694175" target="_blank">https://doi.org/10.1109/APSIPA.2013.6694175</a>
dc.identifier.isbn9789869000604
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/78071
dc.description.abstractLanguage model is an essential component of a speech recogniser. It provides the additional linguistic information to constrain the search space and guide the decoding. In this paper, language model is incorporated in the keyword spotting system to provide the contexts for the keyword models under the weighted finite state transducer framework. A context independent deep neural network is trained as the acoustic model. Three keyword contexts are investigated: the phone to keyword context, fixed length word context and the arbitrary length word context. To provide these contexts, a hybrid language model with both word and phone tokens is trained using only the word n-gram count. Three different spotting graphs are studied depending on the involved contexts: the keyword loop graph, the word fillers graph and the word loop fillers graph. These graphs are referred to as the context dependent (CD) keyword spotting graphs. The CD keyword spotting systems are evaluated on the Broadcasting News Hub4-97 F0 evaluation set. Experimental results reveal that the incorporation of the language model information provides performance gain over the baseline context independent graph without any contexts for all the three CD graphs. The best system using the arbitrary length word context has the comparable performance to the full decoding but triples the spotting speed. In addition, error analysis demonstrates that the language model information is essential to reduce both the insertion and deletion errors. © 2013 APSIPA.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/APSIPA.2013.6694175
dc.sourceScopus
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.doi10.1109/APSIPA.2013.6694175
dc.description.sourcetitle2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
dc.description.page-
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.