Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/16039
Title: Relation extraction for information extraction from free text
Authors: MASLENNIKOV MSTISLAV
Keywords: information extraction text anchor relation template
Issue Date: 24-Jul-2008
Source: MASLENNIKOV MSTISLAV (2008-07-24). Relation extraction for information extraction from free text. ScholarBank@NUS Repository.
Abstract: Information Extraction (IE) is the task of identifying information (e.g. entities, relations or events) from free text. Numerous previous context-, ontology-, rule- and classification-based methods were actively explored during the decades of research on this task. However, a challenging open question of effectively handling the flexibility of natural language remains unresolved over the years. In IE, this implies the problem of sparseness of data instances, which in turn causes the problems of paraphrasing and misalignment of context features of the extracted information. In this thesis, we hypothesize that such problems can be alleviated by combining relations between entities at the phrasal, dependency, semantic and inter-clausal discourse levels. To validate our hypothesis, we develop a 2-level multi-resolution framework ARE (Anchors and Relations). The first level of ARE extracts candidate phrases (anchors), while the second level evaluates the relations among the anchors and composes possible candidate templates. The relations between the anchors are combined in several ways. First, we evaluate dependency relations between anchors. We classify dependency relation paths between the anchors into the Simple, Average and Hard categories according to the path length and develop different techniques to handle them. The category-specific strategies resulted in the improvement of 3%, 4% on the MUC4 (Terrorism) and MUC6 (Management Succession) domains, respectively. The increased performance demonstrates that dependency relations are important to handle paraphrases at the syntactic level. Second, we incorporate the discourse relation analysis in a multi-resolution framework for IE to handle long distance dependency relations and possible paraphrasings at the intra-clausal level. This leads to a further improvement of 3%, 7%, 3% and 4% on MUC4, MUC6 and ACE RDC 2003 (general and specific types) domains, respectively. Third, we explore 2 supplementary strategies to combine relation paths between anchors. Since the amount of negative paths between the anchors is many times more than that of positive paths, we apply a filtering strategy to eliminate negative paths. Also, we support the learning process of our dependency relation classifier by the cascading of the features from the discourse classifier. These 2 strategies further improve the IE performance on the MUC4, MUC6 and ACE RDC 2003 (general and specific types) corpora. Overall, our results affirm the hypothesis that the extraction of candidate phrases (anchors) and the combination of different relation types between anchors in a multi-resolution framework is important to tackle the key problems of paraphrasing and misalignment in Information Extraction.
URI: http://scholarbank.nus.edu.sg/handle/10635/16039
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
PhdThesis_MstislavMaslennikov_22July2008.pdf999.32 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

223
checked on Jan 14, 2018

Download(s)

248
checked on Jan 14, 2018

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.