Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/15159
Title: Chinese word segmentation with a maximum entropy approach
Authors: LOW JIN KIAT
Keywords: Multi lingual processing, corpus based modeling of language, machine learning, Chinese Word Segmentation, Maximum Entropy, Noise Elimination
Issue Date: 8-Mar-2006
Source: LOW JIN KIAT (2006-03-08). Chinese word segmentation with a maximum entropy approach. ScholarBank@NUS Repository.
Abstract: In this thesis, we present a maximum entropy approach to Chinese word segmentation. Besides using features derived from gold-standard word-segmented training data, we also used an external dictionary and additional training corpora of different segmentation standards to further improve segmentation accuracy. The selection of useful additional training data is modeled as example selection from noisy data. Using these techniques, our word segmenter achieved state-of-the-art accuracy. We participated in the Second International Chinese Word Segmentation Bakeoff organized by SIGHAN, and evaluated our word segmenter on all four test corpora in the open track. Among 52 entries in the open track, our word segmenter achieved the highest F-measure on 3 of the 4 test corpora, and the second highest F-measure on the fourth test corpus.
URI: http://scholarbank.nus.edu.sg/handle/10635/15159
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
msc.pdf302.26 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

290
checked on Jan 22, 2018

Download(s)

227
checked on Jan 22, 2018

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.