Elucidate Gender Fairness in Singing Voice Transcription

Please use this identifier to cite or link to this item: https://doi.org/10.1145/3581783.3612272

DC Field	Value
dc.title	Elucidate Gender Fairness in Singing Voice Transcription
dc.contributor.author	Gu, X
dc.contributor.author	Zeng, W
dc.contributor.author	Wang, Y
dc.date.accessioned	2024-01-03T09:31:59Z
dc.date.available	2024-01-03T09:31:59Z
dc.date.issued	2023-10-26
dc.identifier.citation	Gu, X, Zeng, W, Wang, Y (2023-10-26). Elucidate Gender Fairness in Singing Voice Transcription. MM '23: The 31st ACM International Conference on Multimedia : 8760-8769. ScholarBank@NUS Repository. https://doi.org/10.1145/3581783.3612272
dc.identifier.isbn	9798400701085
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/246634
dc.description.abstract	It is widely known that males and females typically possess different sound characteristics when singing, such as timbre and pitch, but it has never been explored whether these gender-based characteristics lead to a performance disparity in singing voice transcription (SVT), whose target includes pitch. Such a disparity could cause fairness issues and severely affect the user experience of downstream SVT applications. Motivated by this, we first demonstrate the female superiority of SVT systems, which is observed across different models and datasets. We find that different pitch distributions, rather than gender data imbalance, contribute to this disparity. To address this issue, we propose using an attribute predictor to predict gender labels and adversarially training the SVT system to enforce the gender-invariance of acoustic representations. Leveraging the prior knowledge that pitch distributions may contribute to the gender bias, we propose conditionally aligning acoustic representations between demographic groups by feeding note events to the attribute predictor. Empirical experiments on multiple benchmark SVT datasets show that our method significantly reduces gender bias (up to more than 50%) with negligible degradation of overall SVT performance, on both in-domain and out-of-domain singing data, thus offering a better fairness-utility trade-off.
dc.publisher	ACM
dc.source	Elements
dc.type	Conference Paper
dc.date.updated	2024-01-03T09:24:54Z
dc.contributor.department	ELECTRICAL AND COMPUTER ENGINEERING
dc.description.doi	10.1145/3581783.3612272
dc.description.sourcetitle	MM '23: The 31st ACM International Conference on Multimedia
dc.description.page	8760-8769
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
2023_ACM_MM2023_Fairness_Singing_camera_ready.pdf	Accepted version	1.43 MB	Adobe PDF	CLOSED	None

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM