Please use this identifier to cite or link to this item:
Title: Automatic alignment of Japanese-Chinese bilingual texts
Authors: Tan, Chew Lim 
Nagao, Makoto
Issue Date: Jan-1995
Source: Tan, Chew Lim,Nagao, Makoto (1995-01). Automatic alignment of Japanese-Chinese bilingual texts. IEICE Transactions on Information and Systems E78-D (1) : 68-76. ScholarBank@NUS Repository.
Abstract: Automatic alignment of bilingual texts is useful to example-based machine translation by facilitating the creation of example pairs of translation for the machine. Two main approaches to automatic alignment have been reported in the literature. They are lexical approach and statistical approach. The former looks for relationships between lexical contents of the bilingual texts in order to find alignment pairs, while the latter uses statistical correlation between sentence lengths of the bilingual texts as the basis of matching. This paper describes a combination of the two approaches in aligning Japanese-Chinese bilingual texts by allowing kanji contents and sentence lengths in the texts to work together in achieving an alignment process. Because of the sentential structure differences between Japanese and Chinese, matching at the sentence level may result in frequent matching between a number of sentences en masses. In view of this, the current work also attempts to create shorter alignment pairs by permitting sentences to be matched with clauses or phrases of the other text if possible. While such matching is more difficult and error-prone, the reliance on kanji contents has proven to be very useful in minimizing the errors. The current research has thus found solutions to problems that are unique to the present work.
Source Title: IEICE Transactions on Information and Systems
ISSN: 09168532
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Feb 23, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.