Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/15159
DC FieldValue
dc.titleChinese word segmentation with a maximum entropy approach
dc.contributor.authorLOW JIN KIAT
dc.date.accessioned2010-04-08T10:50:39Z
dc.date.available2010-04-08T10:50:39Z
dc.date.issued2006-03-08
dc.identifier.citationLOW JIN KIAT (2006-03-08). Chinese word segmentation with a maximum entropy approach. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/15159
dc.description.abstractIn this thesis, we present a maximum entropy approach to Chinese word segmentation. Besides using features derived from gold-standard word-segmented training data, we also used an external dictionary and additional training corpora of different segmentation standards to further improve segmentation accuracy. The selection of useful additional training data is modeled as example selection from noisy data. Using these techniques, our word segmenter achieved state-of-the-art accuracy. We participated in the Second International Chinese Word Segmentation Bakeoff organized by SIGHAN, and evaluated our word segmenter on all four test corpora in the open track. Among 52 entries in the open track, our word segmenter achieved the highest F-measure on 3 of the 4 test corpora, and the second highest F-measure on the fourth test corpus.
dc.language.isoen
dc.subjectMulti lingual processing, corpus based modeling of language, machine learning, Chinese Word Segmentation, Maximum Entropy, Noise Elimination
dc.typeThesis
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorNG HWEE TOU
dc.description.degreeMaster's
dc.description.degreeconferredMASTER OF SCIENCE
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Master's Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
msc.pdf302.26 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

362
checked on Jun 14, 2019

Download(s)

239
checked on Jun 14, 2019

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.