Please use this identifier to cite or link to this item:
DC FieldValue
dc.titleChinese word segmentation with a maximum entropy approach
dc.contributor.authorLOW JIN KIAT
dc.identifier.citationLOW JIN KIAT (2006-03-08). Chinese word segmentation with a maximum entropy approach. ScholarBank@NUS Repository.
dc.description.abstractIn this thesis, we present a maximum entropy approach to Chinese word segmentation. Besides using features derived from gold-standard word-segmented training data, we also used an external dictionary and additional training corpora of different segmentation standards to further improve segmentation accuracy. The selection of useful additional training data is modeled as example selection from noisy data. Using these techniques, our word segmenter achieved state-of-the-art accuracy. We participated in the Second International Chinese Word Segmentation Bakeoff organized by SIGHAN, and evaluated our word segmenter on all four test corpora in the open track. Among 52 entries in the open track, our word segmenter achieved the highest F-measure on 3 of the 4 test corpora, and the second highest F-measure on the fourth test corpus.
dc.subjectMulti lingual processing, corpus based modeling of language, machine learning, Chinese Word Segmentation, Maximum Entropy, Noise Elimination
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorNG HWEE TOU
dc.description.degreeconferredMASTER OF SCIENCE
Appears in Collections:Master's Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
msc.pdf302.26 kBAdobe PDF



Page view(s)

checked on Jun 14, 2019


checked on Jun 14, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.