Please use this identifier to cite or link to this item: https://doi.org/10.1109/ASRU.2013.6707753
Title: Context-dependent modelling of deep neural network using logistic regression
Authors: Wang, G.
Sim, K.C. 
Keywords: Articulatory Features
Canonical State Modelling
Context-Dependent Modelling
Deep Neural Network
Logistic Regression
Issue Date: 2013
Source: Wang, G.,Sim, K.C. (2013). Context-dependent modelling of deep neural network using logistic regression. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings : 338-343. ScholarBank@NUS Repository. https://doi.org/10.1109/ASRU.2013.6707753
Abstract: The data sparsity problem of context-dependent acoustic modelling in automatic speech recognition is addressed by using the decision tree state clusters as the training targets in the standard context-dependent (CD) deep neural network (DNN) systems. As a result, the CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In this paper, we formulate the CD DNN as an instance of the canonical state modelling technique based on a set of broad phone classes to address both the data sparsity and the clustering problems. The triphone is clustered into multiple sets of shorter biphones using broad phone contexts to address the data sparsity issue. A DNN is trained to discriminate the biphones within each set. The canonical states are represented by the concatenated log posteriors of all the broad phone DNNs. Logistic regression is used to transform the canonical states into the triphone state output probability. Clustering of the regression parameters is used to reduce model complexity while still achieving unique acoustic scores for all possible triphones. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD DNN significantly outperforms the standard CD DNN. The best system provides a 2.7% absolute WER reduction compared to the best standard CD DNN system. © 2013 IEEE.
Source Title: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings
URI: http://scholarbank.nus.edu.sg/handle/10635/78072
ISBN: 9781479927562
DOI: 10.1109/ASRU.2013.6707753
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

3
checked on Feb 12, 2018

Page view(s)

41
checked on Feb 16, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.