Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41955

DC Field	Value
dc.title	Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus
dc.contributor.author	Nakov, S.
dc.contributor.author	Nakov, P.
dc.contributor.author	Paskaleva, E.
dc.date.accessioned	2013-07-04T08:39:50Z
dc.date.available	2013-07-04T08:39:50Z
dc.date.issued	2009
dc.identifier.citation	Nakov, S.,Nakov, P.,Paskaleva, E. (2009). Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus. International Conference Recent Advances in Natural Language Processing, RANLP : 292-298. ScholarBank@NUS Repository.
dc.identifier.issn	13138502
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/41955
dc.description.abstract	False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as "bridges". Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.
dc.source	Scopus
dc.subject	Cognates
dc.subject	Cross-lingual semantic similarity
dc.subject	False friends
dc.subject	Statistical machine translation
dc.subject	Web as a corpus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.sourcetitle	International Conference Recent Advances in Natural Language Processing, RANLP
dc.description.page	292-298
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM