Please use this identifier to cite or link to this item:
|Title:||Better evaluation metrics lead to better machine translation|
|Source:||Liu, C.,Dahlmeier, D.,Ng, H.T. (2011). Better evaluation metrics lead to better machine translation. EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference : 375-384. ScholarBank@NUS Repository.|
|Abstract:||Many machine translation evaluation metrics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, advances in automatic machine translation evaluation can lead directly to advances in automatic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-the-art machine translation system over its BLEU-tuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better human-judged translation quality than the BLEU-tuned baseline. TESLA-M in particular is simple and performs well in practice on large datasets. We release all our implementation under an open source license. It is our hope that this work will encourage the machine translation community to finally move away from BLEU as the unquestioned default and to consider the new generation metrics when tuning their systems. © 2011 Association for Computational Linguistics.|
|Source Title:||EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 9, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.