Multi-level fusion based class-aware attention model for weakly labeled audio tagging | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1145/3343031.3351090

Title:	Multi-level fusion based class-aware attention model for weakly labeled audio tagging
Authors:	Yin, Y Shrivastava, H Chiou, MJ Shah, RR Liu, Z Zimmermann, R
Issue Date:	15-Oct-2019
Publisher:	ACM
Citation:	Yin, Y, Shrivastava, H, Chiou, MJ, Shah, RR, Liu, Z, Zimmermann, R (2019-10-15). Multi-level fusion based class-aware attention model for weakly labeled audio tagging. MM '19: The 27th ACM International Conference on Multimedia : 1304-1312. ScholarBank@NUS Repository. https://doi.org/10.1145/3343031.3351090
Abstract:	Recognizing ongoing events based on acoustic clues has been a critical research problem for a variety of AI applications. Compared to visual inputs, acoustic cues tend to be less descriptive and less consistent in time domain. The duration of a sound event can be quite short, which creates great difficulties for, especially weakly labeled, audio tagging. To solve these challenges, we present a novel end-to-end multi-level attention model that first makes segment-level predictions with temporal modeling, followed by advanced aggregations along both time and feature domains. Our model adopts class-aware attention based temporal fusion to highlight/suppress the relevant/irrelevant segments to each class. Moreover, to improve the representation ability of acoustic inputs, a new multi-level feature fusion method is proposed to obtain more accurate segment-level predictions, as well as to perform more effective multi-layer aggregation of clip-level predictions. We additionally introduce a weight sharing strategy to reduce model complexity and overfitting. Comprehensive experiments have been conducted on the AudioSet and the DCASE17 datasets. Experimental results show that our proposed method works remarkably well and obtains the state-of-the-art audio tagging results on both datasets. Furthermore, we show that our proposed multi-level fusion based model can be easily integrated with existing systems where additional performance gain can be obtained.
Source Title:	MM '19: The 27th ACM International Conference on Multimedia
URI:	https://scholarbank.nus.edu.sg/handle/10635/200726
ISBN:	9781450368896
DOI:	10.1145/3343031.3351090
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
main.pdf		4.07 MB	Adobe PDF	CLOSED	None

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.