Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/48677
Title: Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion
Authors: LI ZHONGHUA
Keywords: music information retrieval, content analysis, domain-specific, audio quality assessment, multimodal fusion
Issue Date: 6-Aug-2013
Source: LI ZHONGHUA (2013-08-06). Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion. ScholarBank@NUS Repository.
Abstract: With the explosive growth of online music data over the past decade, music information retrieval has become increasingly important to help users find their desired music information. Under different application scenarios, users generally need to search for music in various ways with different information needs. Moreover, music is inherently multi-faceted and contains heterogeneous types of music data (e.g., metadata, audio content). For effective multimodal music retrieval, therefore, it is essential to discover users' information needs and to appropriately combine the multiple facets. Most existing music search engines are intended for general search using textual metadata or example tracks. Thus, they fail to address the needs of many specific domains, where the required music dimensions or query methods may differ from those covered by general search engines. Content analysis on these music dimensions (e.g., ethnic styles, audio quality) are also not well addressed. In addition, fusion methods of multiple music dimensions and modalities also tend to associate the fusion weight to only queries and cannot achieve the optimal fusion strategy. My research studies and improves multimodal music retrieval system from several aspects. First, I study multimodal music retrieval in a specific domain where queries are restricted to certain music dimensions (e.g., tempo). Novel query input methods are proposed to capture users' information needs. Then effective audio content analysis is performed to improve the unimodal music retrieval performance. Audio quality, an important but overlooked music dimension for online music search is also studied. Given that multiple music dimensions in different modalities are related to a given query, effective fusion methods to combine different modalities are also investigated. For the first time, document dependence is introduced into fusion weight derivation, and its efficacy is also verified. A general multimodal fusion framework, query-document-dependent fusion, is then proposed to extend existing works by deriving the optimal fusion strategy for each query-document pair. This enables each document to combine its modalities in the optimal way and unleashes the power of different modalities in the retrieval process. Besides using existing datasets, several datasets are also constructed for the related research. Comprehensive experiments and user studies have been carried out and have validated the efficacy of both the proposed approaches and systems.
URI: http://scholarbank.nus.edu.sg/handle/10635/48677
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LiZhonghua.pdf2.59 MBAdobe PDF

OPEN

NoneView/Download

Page view(s)

143
checked on Dec 11, 2017

Download(s)

191
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.