Please use this identifier to cite or link to this item:
|Title:||Modelling of Chinese texts using bi-character and tri-character Chinese words for data compression|
|Authors:||Ong, Ghim Hwee |
Huang, Shell Ying
Chong, Wing Teck
|Citation:||Ong, Ghim Hwee,Huang, Shell Ying,Chong, Wing Teck (1994). Modelling of Chinese texts using bi-character and tri-character Chinese words for data compression. National Conference Publication - Institution of Engineers, Australia 2 (94 /9) : 1175-1180. ScholarBank@NUS Repository.|
|Abstract:||Text compression operation can be separated into two parts. One part is modelling and the other is coding. Coding algorithms have been well developed. However, due to the different characteristics of the Chinese text files, compression based on the byte oriented models may not yield the best compression ratio. In this paper, a new model of Chinese texts using bi-character words and tri-character words in addition to single characters is presented and evaluated against the byte-based and the character based models. Compression is carried out by adaptive Huffman coding algorithm for five medium sized Chinese texts about different subjects. It is shown that the new model returns the best entropy value and compression ratio.|
|Source Title:||National Conference Publication - Institution of Engineers, Australia|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Oct 19, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.