Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/17105
Title: Applying semantic analysis to finding similar questions in community question answering systems
Authors: NGUYEN LE NGUYEN
Keywords: Semantic Analysis, Question Answering, Question Matching
Issue Date: 4-Jan-2010
Citation: NGUYEN LE NGUYEN (2010-01-04). Applying semantic analysis to finding similar questions in community question answering systems. ScholarBank@NUS Repository.
Abstract: Research in Question Answering (QA) has been carried out for a long time from the 1960s. In the beginning, traditional QA systems were basically known as the expert systems that find the factoid answers in the fixed document collections. Recently, with the emergence of World Wide Web, automatically finding the answers to user's questions by exploiting the large-scale knowledge available on the Internet has become a reality. Instead of finding answers in a fixed document collection, QA system will search the answers in the web resources or community forums if the similar question has been asked before. However, there are many challenges in building the QA systems based on community forums (cQA). These include: (a) how to recognize the main question asked, especially on measuring the semantic similarity between the questions, and (b) how to handle the grammatical errors in forums language. Since people are more casual when they write in forums, there are many sentences in the forums that contain grammatical errors and are semantically similar but may not share any common words. Therefore, extracting semantic information is useful for supporting the task of finding similar questions in cQA systems.

In this thesis, we employ a semantic role labeling system by leveraging on grammatical relations extracted from a syntactic parser and combining it with a machine learning method to annotate the semantic information in the questions. We then utilize the similarity scores by using semantic matching to choose the similar questions. We carry out experiment based on the data sets collected from Healthcare domain in Yahoo! Answers over a 10-month period from 15/02/08 to 20/12/08. The results of our experiments show that with the use of our semantic annotation approach named GReSeA, our system outperforms the baseline Bag-Of-Word (BOW) system in terms of MAP by 2.63% and Precision at top 1 retrieval results by 12.68%. Compared with using the popular SRL system ASSERT on the same task of finding similar questions in Yahoo! Answer, our system using GReSeA outperforms those using ASSERT by 4.3% in terms of MAP and by 4.26% in Precision at top 1 retrieval results. Additionally, our combination system of BOW and GReSeA achieves the improvement by 2.13% (91.30% vs. 89.17%) in Precision at top 1 retrieval results when compared with the state-of-the-art Syntactic Tree Matching system in finding similar questions in cQA.
URI: http://scholarbank.nus.edu.sg/handle/10635/17105
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
NguyenLN.pdf1.6 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.