Please use this identifier to cite or link to this item:
Title: Learning and fusing multimodal deep features for acoustic scene categorization
Authors: Yin, Y 
Shah, RR
Zimmermann, R
Issue Date: 15-Oct-2018
Publisher: ACM
Citation: Yin, Y, Shah, RR, Zimmermann, R (2018-10-15). Learning and fusing multimodal deep features for acoustic scene categorization. MM '18: ACM Multimedia Conference : 1892-1900. ScholarBank@NUS Repository.
Abstract: Convolutional Neural Networks (CNNs) have been widely applied to audio classification recently where promising results have been obtained. Previous CNN-based systems mostly learn from two-dimensional time-frequency representations such as MFCC and spectrograms, which may tend to emphasize more on the background noise of the scene. To learn the key acoustic events, we introduce a three-dimensional CNN to emphasize on the different spectral characteristics from neighboring regions in spatial-temporal domain. A novel acoustic scene classification system based on multimodal deep feature fusion is proposed in this paper, where three CNNs have been presented to perform 1D raw waveform modeling, 2D time-frequency image modeling, and 3D spatial-temporal dynamics modeling, respectively. The learnt features are shown to be highly complementary to each other, which are next combined in a feature fusion network to obtain significantly improved classification predictions. Comprehensive experiments have been conducted on two large-scale acoustic scene datasets, namely the DCASE16 dataset and the LITIS Rouen dataset. Experimental results demonstrate the effectiveness of our proposed approach, as our solution achieves state-of-the-art classification rates and improves the average classification accuracy by 1.5% ∼ 8.2% compared to the top ranked systems in the DCASE16 challenge.
Source Title: MM '18: ACM Multimedia Conference
ISBN: 9781450356657
DOI: 10.1145/3240508.3240631
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
main.pdf1.48 MBAdobe PDF




checked on Nov 29, 2021

Page view(s)

checked on Dec 2, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.