Splitting-Merging Model of Chinese Word Tokenization and Segmentation

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/130419

DC Field	Value
dc.title	Splitting-Merging Model of Chinese Word Tokenization and Segmentation
dc.contributor.author	Yao, Y.
dc.contributor.author	Lua, K.T.
dc.date.accessioned	2016-11-16T11:05:46Z
dc.date.available	2016-11-16T11:05:46Z
dc.date.issued	1998
dc.identifier.citation	Yao, Y., Lua, K.T. (1998). Splitting-Merging Model of Chinese Word Tokenization and Segmentation. Natural Language Engineering 4 (4) : 309-324. ScholarBank@NUS Repository.
dc.identifier.issn	13513249
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/130419
dc.description.abstract	Word tokenization & segmentation in natural language processing of languages like Chinese, which have no blank space for word delimitation, are considered. Three major problems are faced: (1) tokenizing direction & efficiency, (2) insufficient tokenization dictionary & nonwords, & (3) ambiguity of tokenization & segmentation. Most existing tokenization & segmentation methods have not dealt with the above problems together. A novel dictionary-based method called the splitting-merging model for Chinese word tokenization & segmentation is presented. It uses the mutual information of Chinese characters to find the boundaries & the non-boundaries of Chinese words, & finally leads to word segmentation by resolving ambiguities & detecting new words.
dc.source	Scopus
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.description.sourcetitle	Natural Language Engineering
dc.description.volume	4
dc.description.issue	4
dc.description.page	309-324
dc.description.coden	NLENF
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM