Please use this identifier to cite or link to this item:
|Title:||Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus|
Cross-lingual semantic similarity
Statistical machine translation
Web as a corpus
|Citation:||Nakov, S.,Nakov, P.,Paskaleva, E. (2009). Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus. International Conference Recent Advances in Natural Language Processing, RANLP : 292-298. ScholarBank@NUS Repository.|
|Abstract:||False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as "bridges". Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.|
|Source Title:||International Conference Recent Advances in Natural Language Processing, RANLP|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Oct 6, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.