Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.patcog.2007.10.017
Title: Retrieval of machine-printed Latin documents through Word Shape Coding
Authors: Lu, S.
Tan, C.L. 
Keywords: Document image analysis
Language identification
Multilingual document retrieval
Word shape coding
Issue Date: 2008
Source: Lu, S.,Tan, C.L. (2008). Retrieval of machine-printed Latin documents through Word Shape Coding. Pattern Recognition 41 (5) : 1816-1826. ScholarBank@NUS Repository. https://doi.org/10.1016/j.patcog.2007.10.017
Abstract: This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a word shape coding scheme is proposed, which converts each word image into a word shape code by using a few shape features. The text contents of imaged documents are thus captured by a document vector constructed with the converted word shape code and word frequency information. Similarities between different document images are then gauged based on the constructed document vectors. We divide the retrieval process into two stages. Based on the observation that documents of the same language share a large number of high-frequency language-specific stop words, the first stage retrieves documents with the same underlying language as that of the query document. The second stage then re-ranks the documents retrieved in the first stage based on the topic similarity. Experiments show that document images of different languages and topics can be retrieved properly by using the proposed word shape coding scheme. © 2007 Elsevier Ltd. All rights reserved.
Source Title: Pattern Recognition
URI: http://scholarbank.nus.edu.sg/handle/10635/39745
ISSN: 00313203
DOI: 10.1016/j.patcog.2007.10.017
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

131
checked on Dec 5, 2017

WEB OF SCIENCETM
Citations

7
checked on Nov 1, 2017

Page view(s)

59
checked on Dec 9, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.