Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41955

Title:	Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus
Authors:	Nakov, S. Nakov, P. Paskaleva, E.
Keywords:	Cognates Cross-lingual semantic similarity False friends Statistical machine translation Web as a corpus
Issue Date:	2009
Citation:	Nakov, S.,Nakov, P.,Paskaleva, E. (2009). Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus. International Conference Recent Advances in Natural Language Processing, RANLP : 292-298. ScholarBank@NUS Repository.
Abstract:	False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as "bridges". Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.
Source Title:	International Conference Recent Advances in Natural Language Processing, RANLP
URI:	http://scholarbank.nus.edu.sg/handle/10635/41955
ISSN:	13138502
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.