Please use this identifier to cite or link to this item:
https://doi.org/10.1145/3343031.3351090
Title: | Multi-level fusion based class-aware attention model for weakly labeled audio tagging | Authors: | Yin, Y Shrivastava, H Chiou, MJ Shah, RR Liu, Z Zimmermann, R |
Issue Date: | 15-Oct-2019 | Publisher: | ACM | Citation: | Yin, Y, Shrivastava, H, Chiou, MJ, Shah, RR, Liu, Z, Zimmermann, R (2019-10-15). Multi-level fusion based class-aware attention model for weakly labeled audio tagging. MM '19: The 27th ACM International Conference on Multimedia : 1304-1312. ScholarBank@NUS Repository. https://doi.org/10.1145/3343031.3351090 | Abstract: | Recognizing ongoing events based on acoustic clues has been a critical research problem for a variety of AI applications. Compared to visual inputs, acoustic cues tend to be less descriptive and less consistent in time domain. The duration of a sound event can be quite short, which creates great difficulties for, especially weakly labeled, audio tagging. To solve these challenges, we present a novel end-to-end multi-level attention model that first makes segment-level predictions with temporal modeling, followed by advanced aggregations along both time and feature domains. Our model adopts class-aware attention based temporal fusion to highlight/suppress the relevant/irrelevant segments to each class. Moreover, to improve the representation ability of acoustic inputs, a new multi-level feature fusion method is proposed to obtain more accurate segment-level predictions, as well as to perform more effective multi-layer aggregation of clip-level predictions. We additionally introduce a weight sharing strategy to reduce model complexity and overfitting. Comprehensive experiments have been conducted on the AudioSet and the DCASE17 datasets. Experimental results show that our proposed method works remarkably well and obtains the state-of-the-art audio tagging results on both datasets. Furthermore, we show that our proposed multi-level fusion based model can be easily integrated with existing systems where additional performance gain can be obtained. | Source Title: | MM '19: The 27th ACM International Conference on Multimedia | URI: | https://scholarbank.nus.edu.sg/handle/10635/200726 | ISBN: | 9781450368896 | DOI: | 10.1145/3343031.3351090 |
Appears in Collections: | Staff Publications Elements |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
main.pdf | 4.07 MB | Adobe PDF | CLOSED | None |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.