Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/130419
DC Field | Value | |
---|---|---|
dc.title | Splitting-Merging Model of Chinese Word Tokenization and Segmentation | |
dc.contributor.author | Yao, Y. | |
dc.contributor.author | Lua, K.T. | |
dc.date.accessioned | 2016-11-16T11:05:46Z | |
dc.date.available | 2016-11-16T11:05:46Z | |
dc.date.issued | 1998 | |
dc.identifier.citation | Yao, Y., Lua, K.T. (1998). Splitting-Merging Model of Chinese Word Tokenization and Segmentation. Natural Language Engineering 4 (4) : 309-324. ScholarBank@NUS Repository. | |
dc.identifier.issn | 13513249 | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/130419 | |
dc.description.abstract | Word tokenization & segmentation in natural language processing of languages like Chinese, which have no blank space for word delimitation, are considered. Three major problems are faced: (1) tokenizing direction & efficiency, (2) insufficient tokenization dictionary & nonwords, & (3) ambiguity of tokenization & segmentation. Most existing tokenization & segmentation methods have not dealt with the above problems together. A novel dictionary-based method called the splitting-merging model for Chinese word tokenization & segmentation is presented. It uses the mutual information of Chinese characters to find the boundaries & the non-boundaries of Chinese words, & finally leads to word segmentation by resolving ambiguities & detecting new words. | |
dc.source | Scopus | |
dc.type | Article | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.description.sourcetitle | Natural Language Engineering | |
dc.description.volume | 4 | |
dc.description.issue | 4 | |
dc.description.page | 309-324 | |
dc.description.coden | NLENF | |
dc.identifier.isiut | NOT_IN_WOS | |
Appears in Collections: | Staff Publications |
Show simple item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.