Please use this identifier to cite or link to this item: https://doi.org/10.1145/2009916.2010030
Title: Enriching document representation via translation for improved monolingual information retrieval
Authors: Na, S.-H. 
Ng, H.T. 
Keywords: Algorithms
Experimentation
Performance
Theory
Issue Date: 2011
Citation: Na, S.-H.,Ng, H.T. (2011). Enriching document representation via translation for improved monolingual information retrieval. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval : 853-862. ScholarBank@NUS Repository. https://doi.org/10.1145/2009916.2010030
Abstract: Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.
Source Title: SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
URI: http://scholarbank.nus.edu.sg/handle/10635/41298
ISBN: 9781450309349
DOI: 10.1145/2009916.2010030
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.