Please use this identifier to cite or link to this item:
Title: Audio-visual modeling for bimodal speech recognition
Authors: Kaynak, M.N.
Zhi, Q.
Cheok, A.D. 
Sengupta, K. 
Chung, K.C. 
Keywords: Audio-visual speech recognition
Feature fusion
Hidden Markov models
Visual features
Issue Date: 2001
Citation: Kaynak, M.N.,Zhi, Q.,Cheok, A.D.,Sengupta, K.,Chung, K.C. (2001). Audio-visual modeling for bimodal speech recognition. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics 1 : 181-186. ScholarBank@NUS Repository.
Abstract: Audio-Visual speech recognition is a novel extension of acoustic speech recognition and has received a lot of attention in the last few decades. The main motivation behind bimodal speech recognition is the bimodal characteristics of speech perception and production systems of human beings. In this paper, the effect of the modeling parameters of hidden Markov models (HMM) on the recognition accuracy of the bimodal speech recognizer is analyzed, a comparative analysis of the different HMMs that can be used in bimodal speech recognition is presented, and finally a novel model, which has been experimentally verified to perform better than others is proposed. Also the geometric visual features are compared and analyzed for their importance in bimodal speech recognition. One of the unique characteristics of our bimodal speech recognition system is the novel fusion strategy of the acoustic and the visual features, which takes into account the different sampling rates of these two signals. Compared to acoustic only, the audio-visual speech recognition scheme has a much more improved recognition accuracy, especially in presence of noise.
Source Title: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics
ISSN: 08843627
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Apr 21, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.