Please use this identifier to cite or link to this item:
|Title:||Alignment and matching of bilingual English-Chinese news texts|
|Source:||Xu, D.,Tan, C.L. (2000). Alignment and matching of bilingual English-Chinese news texts. Machine Translation 14 (1) : 1-33. ScholarBank@NUS Repository. https://doi.org/10.1023/A:1008092103873|
|Abstract:||This paper presents a project to align and match bilingual English-Chinese news files downloaded from the China News Service's website. The work involves the alignment of bilingual texts at the sentence and clause levels. It addition, the work also requires matching of files as the English and Chinese news files downloaded from the web do not come in the same sequential order. These news files have their own characteristics and, furthermore, the issue of file-matching has its unique difficulties apart from the known problems of alignment work previously reported in the literature. To align the news files we combine the criteria of 'anchors' (i.e. unambiguous corresponding text elements) and sentence length. We employ Dynamic Programming first to align at the paragraph level, then to align at the sentence-clause level. The precision and recall of the alignment are satisfactory for free translation texts. To match English and Chinese files, we make use of the anchors alone. In file matching we encounter a 'collision' problem due to contending matching candidates, and propose a recursive splitting algorithm to resolve the problem. We allow human intervention to improve the precision of matching, and succeeded in achieving 100% precision with a fairly small amount of manual effort. Finally, to determine the various parameters used in aligning and matching, we utilize a Genetic Algorithm software package to obtain their optimized values.|
|Source Title:||Machine Translation|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 12, 2017
checked on Dec 15, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.