Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/99558
Title: Modelling of Chinese texts using bi-character and tri-character Chinese words for data compression
Authors: Ong, Ghim Hwee 
Huang, Shell Ying
Chong, Wing Teck
Issue Date: 1994
Citation: Ong, Ghim Hwee,Huang, Shell Ying,Chong, Wing Teck (1994). Modelling of Chinese texts using bi-character and tri-character Chinese words for data compression. National Conference Publication - Institution of Engineers, Australia 2 (94 /9) : 1175-1180. ScholarBank@NUS Repository.
Abstract: Text compression operation can be separated into two parts. One part is modelling and the other is coding. Coding algorithms have been well developed. However, due to the different characteristics of the Chinese text files, compression based on the byte oriented models may not yield the best compression ratio. In this paper, a new model of Chinese texts using bi-character words and tri-character words in addition to single characters is presented and evaluated against the byte-based and the character based models. Compression is carried out by adaptive Huffman coding algorithm for five medium sized Chinese texts about different subjects. It is shown that the new model returns the best entropy value and compression ratio.
Source Title: National Conference Publication - Institution of Engineers, Australia
URI: http://scholarbank.nus.edu.sg/handle/10635/99558
ISSN: 03136922
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.