Please use this identifier to cite or link to this item:
|Title:||Enriching document representation via translation for improved monolingual information retrieval|
|Authors:||Na, S.-H. |
|Source:||Na, S.-H.,Ng, H.T. (2011). Enriching document representation via translation for improved monolingual information retrieval. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval : 853-862. ScholarBank@NUS Repository. https://doi.org/10.1145/2009916.2010030|
|Abstract:||Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.|
|Source Title:||SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 13, 2017
checked on Dec 9, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.