Please use this identifier to cite or link to this item:
Keywords: multimedia question answering, product annotations, search
Issue Date: 12-Aug-2011
Abstract: In recent years, we have witnessed the prevalence of community-based Question Answering (cQA) systems that are able to provide precise answers to a wide variety of questions. However, answers from most QA systems are in the form of text, such as the Yahoo! Answers. For some questions, visual answers such as images and videos would be more direct and intuitive. The aim of this thesis is to extend the text-based QA to multimedia QA to answer a range of factoid and ¿how-to¿ QA. The systems will be designed to find additional multimedia answers from Web-based media resources such as YouTube, Google and Amazon to supplement the text answers. The thesis presents a novel solution to "how-to" QA by leveraging community contributed text and video answers on the Web. In our video QA framework, given a text based question, we first leverage similar question search on YA to increase the semantic coverage of the original question. Second, we extract the key phrases from these questions as queries to search for video answer candidates. At the same time, the classification of the questions in YA is used to find the related visual concepts based on the off-line domain-specific word mining. Third, we utilize text-analysis, visual analysis, opinion analysis and video redundancy to find the most relevant video answers from the community video candidates. Experiments conducted with questions from Yahoo! Answers archive demonstrated the feasibility and effectiveness of our approach. For the visual recognition component in the video QA framework, we also propose a new scheme for product annotation in videos. To cater to the huge variety and frequent introduction of new products, we introduce a simple yet effective method to harvest a large amount of product visual examples from the Web. Besides, we introduce a novel correlative sparsification method to generate the sparse visual signatures of products. It is able to reduce the noise of the visual signatures, such that better annotation performance can be achieved. We also introduce a method that simultaneously leverages Amazon and Google image search engine, which represent a specific knowledge resource and general Web information collection respectively. The whole process is automated and does not require humans¿ manual efforts. These visual signatures are used to annotate video frames. A series of experiments conducted on more than 1,000 Web videos demonstrated the feasibility and effectiveness of our approach. Besides, the proposed approach can increase the performance of the system as compared to the original visual recognition component. We also propose a relevant and diverse image search approach, which aims to return a small set of images which can cover all aspects of the product without any redundancy. This approach can be regarded as one possible solution of the image-based factoid QA for products. A conditional clustering approach is applied regarding the Amazon examples as information prior. In this way, a set of exemplars can be found from the Google search results; they are then provided together with the Amazon example images as a set of relevant and diverse results for product search. The work can enrich the example images on Amazon with the search results from Google and also refine Google image search results by exploring the example images on Amazon. Experiments are conducted on a set of products and the results demonstrate the feasibility and effectiveness of our approach. Many other interesting future research directions can be explored to support a more precise and user friendly multimedia question answering. Future works in the pipeline include more integrated multimedia search engines, which can return text, image, and video as answers for better QA experience, and the content-based online video advertisement, which can present the advertisement based on the visual relevance.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LiGD.pdf3.13 MBAdobe PDF



Page view(s)

checked on Apr 21, 2019


checked on Apr 21, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.