Please use this identifier to cite or link to this item:
Title: Chinese word segmentation with a maximum entropy approach
Keywords: Multi lingual processing, corpus based modeling of language, machine learning, Chinese Word Segmentation, Maximum Entropy, Noise Elimination
Issue Date: 8-Mar-2006
Citation: LOW JIN KIAT (2006-03-08). Chinese word segmentation with a maximum entropy approach. ScholarBank@NUS Repository.
Abstract: In this thesis, we present a maximum entropy approach to Chinese word segmentation. Besides using features derived from gold-standard word-segmented training data, we also used an external dictionary and additional training corpora of different segmentation standards to further improve segmentation accuracy. The selection of useful additional training data is modeled as example selection from noisy data. Using these techniques, our word segmenter achieved state-of-the-art accuracy. We participated in the Second International Chinese Word Segmentation Bakeoff organized by SIGHAN, and evaluated our word segmenter on all four test corpora in the open track. Among 52 entries in the open track, our word segmenter achieved the highest F-measure on 3 of the 4 test corpora, and the second highest F-measure on the fourth test corpus.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
msc.pdf302.26 kBAdobe PDF



Page view(s)

checked on Apr 19, 2019


checked on Apr 19, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.