Please use this identifier to cite or link to this item:
Title: Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion
Keywords: music information retrieval, content analysis, domain-specific, audio quality assessment, multimodal fusion
Issue Date: 6-Aug-2013
Citation: LI ZHONGHUA (2013-08-06). Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion. ScholarBank@NUS Repository.
Abstract: With the explosive growth of online music data over the past decade, music information retrieval has become increasingly important to help users find their desired music information. Under different application scenarios, users generally need to search for music in various ways with different information needs. Moreover, music is inherently multi-faceted and contains heterogeneous types of music data (e.g., metadata, audio content). For effective multimodal music retrieval, therefore, it is essential to discover users' information needs and to appropriately combine the multiple facets. Most existing music search engines are intended for general search using textual metadata or example tracks. Thus, they fail to address the needs of many specific domains, where the required music dimensions or query methods may differ from those covered by general search engines. Content analysis on these music dimensions (e.g., ethnic styles, audio quality) are also not well addressed. In addition, fusion methods of multiple music dimensions and modalities also tend to associate the fusion weight to only queries and cannot achieve the optimal fusion strategy. My research studies and improves multimodal music retrieval system from several aspects. First, I study multimodal music retrieval in a specific domain where queries are restricted to certain music dimensions (e.g., tempo). Novel query input methods are proposed to capture users' information needs. Then effective audio content analysis is performed to improve the unimodal music retrieval performance. Audio quality, an important but overlooked music dimension for online music search is also studied. Given that multiple music dimensions in different modalities are related to a given query, effective fusion methods to combine different modalities are also investigated. For the first time, document dependence is introduced into fusion weight derivation, and its efficacy is also verified. A general multimodal fusion framework, query-document-dependent fusion, is then proposed to extend existing works by deriving the optimal fusion strategy for each query-document pair. This enables each document to combine its modalities in the optimal way and unleashes the power of different modalities in the retrieval process. Besides using existing datasets, several datasets are also constructed for the related research. Comprehensive experiments and user studies have been carried out and have validated the efficacy of both the proposed approaches and systems.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LiZhonghua.pdf2.59 MBAdobe PDF



Page view(s)

checked on Apr 20, 2019


checked on Apr 20, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.