Please use this identifier to cite or link to this item:
https://doi.org/10.1016/S0167-6393(03)00099-2
Title: | Speech emotion recognition using hidden Markov models | Authors: | Nwe, T.L. Foo, S.W. De Silva, L.C. |
Keywords: | Emotional speech Hidden Markov model Human communication Log frequency power coefficients Recognition of emotion |
Issue Date: | 2003 | Citation: | Nwe, T.L., Foo, S.W., De Silva, L.C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication 41 (4) : 603-623. ScholarBank@NUS Repository. https://doi.org/10.1016/S0167-6393(03)00099-2 | Abstract: | In emotion classification of speech signals, the popular features employed are statistics of fundamental frequency, energy contour, duration of silence and voice quality. However, the performance of systems employing these features degrades substantially when more than two categories of emotion are to be classified. In this paper, a text independent method of emotion classification of speech is proposed. The proposed method makes use of short time log frequency power coefficients (LFPC) to represent the speech signals and a discrete hidden Markov model (HMM) as the classifier. The emotions are classified into six categories. The category labels used are, the archetypal emotions of Anger, Disgust, Fear, Joy, Sadness and Surprise. A database consisting of 60 emotional utterances, each from twelve speakers is constructed and used to train and test the proposed system. Performance of the LFPC feature parameters is compared with that of the linear prediction Cepstral coefficients (LPCC) and mel-frequency Cepstral coefficients (MFCC) feature parameters commonly used in speech recognition systems. Results show that the proposed system yields an average accuracy of 78% and the best accuracy of 96% in the classification of six emotions. This is beyond the 17% chances by a random hit for a sample set of 6 categories. Results also reveal that LFPC is a better choice as feature parameters for emotion classification than the traditional feature parameters. © 2003 Elsevier B.V. All rights reserved. | Source Title: | Speech Communication | URI: | http://scholarbank.nus.edu.sg/handle/10635/43043 | ISSN: | 01676393 | DOI: | 10.1016/S0167-6393(03)00099-2 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.