Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/35226
Title: Coreference resolution: maximum metric score training, domain adaptation, and zero pronoun resolution
Authors: ZHAO SHANHENG
Keywords: coreference resolution,zero pronoun resolution,domain adaptation
Issue Date: 28-Jul-2011
Citation: ZHAO SHANHENG (2011-07-28). Coreference resolution: maximum metric score training, domain adaptation, and zero pronoun resolution. ScholarBank@NUS Repository.
Abstract: Coreference resolution is one of the central tasks in natural language processing. Successful oreference resolution benefits many other natural language processing and information extraction tasks. This thesis explores three important research issues in coreference resolution. A large body of prior research on coreference resolution recasts the problem as a two-class classification problem. However, standard supervised machine learning algorithms that minimize classification errors on the training instances do not always lead to maximizing the F-measure of the chosen evaluation metric for coreference resolution. We propose a novel approach comprising the use of instance weighting and beam search to maximize the evaluation metric score on the training corpus during training. Experimental results show that this approach achieves significant improvement over the state of the art. We report results on standard benchmark corpora (two MUC corpora and three ACE corpora), when evaluated using the link-based MUC metric and the mention-based B-CUBED metric. In the literature, most prior work on coreference resolution worked on newswire domain. Although a coreference resolution system trained on the newswire domain performs well on the same domain, there is a huge performance drop when it is applied to the biomedical domain. Annotating coreferential relations in a new domain is very time-consuming. This raises the question of how we can adapt a coreference resolution system trained on a resource-rich domain to a new domain with minimum data annotations. We present an approach integrating domain adaptation with active learning to adapt coreference resolution from newswire domain to biomedical domain, and explore the effect of domain adaptation, active learning, and target domain instance weighting for coreference resolution. Experimental results show that domain adaptation with active learning and the weighting scheme achieves performance on MEDLINE abstracts similar to a system trained on full coreference annotation, but with a hugely reduced number of training instances that we need to annotate. Lastly, we present a machine learning approach to the identification and resolution of Chinese anaphoric zero pronouns. We perform both identification and resolution automatically, with two sets of easily computable features. Experimental results show that our proposed learning approach achieves anaphoric zero pronoun resolution accuracy comparable to a previous state-of-the-art, heuristic rule-based approach. To our knowledge, our work is the first to perform both identification and resolution of Chinese anaphoric zero pronouns using a machine learning approach.
URI: http://scholarbank.nus.edu.sg/handle/10635/35226
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ZhaoSH.pdf697.28 kBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.