Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41955
DC FieldValue
dc.titleUnsupervised extraction of false friends from parallel bi-texts using the Web as a corpus
dc.contributor.authorNakov, S.
dc.contributor.authorNakov, P.
dc.contributor.authorPaskaleva, E.
dc.date.accessioned2013-07-04T08:39:50Z
dc.date.available2013-07-04T08:39:50Z
dc.date.issued2009
dc.identifier.citationNakov, S.,Nakov, P.,Paskaleva, E. (2009). Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus. International Conference Recent Advances in Natural Language Processing, RANLP : 292-298. ScholarBank@NUS Repository.
dc.identifier.issn13138502
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/41955
dc.description.abstractFalse friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as "bridges". Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.
dc.sourceScopus
dc.subjectCognates
dc.subjectCross-lingual semantic similarity
dc.subjectFalse friends
dc.subjectStatistical machine translation
dc.subjectWeb as a corpus
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.sourcetitleInternational Conference Recent Advances in Natural Language Processing, RANLP
dc.description.page292-298
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.