Please use this identifier to cite or link to this item:
Title: Multi-criteria-based active learning for named entity recognition
Authors: SHEN DAN
Keywords: active learning, named entity recognition, multiple criteria, informativeness, representativeness, diversity
Issue Date: 5-May-2004
Citation: SHEN DAN (2004-05-05). Multi-criteria-based active learning for named entity recognition. ScholarBank@NUS Repository.
Abstract: In this thesis, we propose a multi-criteria-based active learning approach and effectively apply it to the named entity recognition task. Active learning targets to minimize the human annotation efforts to learn a model with the same performance level as supervised learning by selecting the most useful examples for labeling. To maximize the contribution of the selected examples, we consider the multiple criteria including informativeness, representativeness and diversity and propose some measurements to quantify them respectively in the SVM-based named entity recognition. More comprehensively, we effectively incorporate all the criteria using two active learning strategies, both of which result in less labeling cost than the single-criterion-based method. The best results show that the labeling cost can be reduced by 95% in the newswire domain and 86% in the biomedical domain without degrading the performance of the named entity recognizer. To our best knowledge, this is not only the first work to incorporate the multiple criteria in active learning but also the first work to study active learning for named entity recognition. Furthermore, since the above measurements and active learning strategies are quite general, they can also be easily adapted to other natural language processing tasks.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
SHEND.pdf872.85 kBAdobe PDF



Page view(s)

checked on May 22, 2019


checked on May 22, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.