Please use this identifier to cite or link to this item: https://doi.org/10.1109/TPAMI.2002.1008389
Title: Imaged document text retrieval without OCR
Authors: Tan, C.L. 
Huang, W. 
Yu, Z.
Xu, Y.
Keywords: Document image analysis
Document vector
Text retrieval
Text similarity
Issue Date: 2002
Source: Tan, C.L.,Huang, W.,Yu, Z.,Xu, Y. (2002). Imaged document text retrieval without OCR. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (6) : 838-844. ScholarBank@NUS Repository. https://doi.org/10.1109/TPAMI.2002.1008389
Abstract: We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from UW1 database confirms the validity of the proposed method.
Source Title: IEEE Transactions on Pattern Analysis and Machine Intelligence
URI: http://scholarbank.nus.edu.sg/handle/10635/39184
ISSN: 01628828
DOI: 10.1109/TPAMI.2002.1008389
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

69
checked on Dec 12, 2017

Page view(s)

72
checked on Dec 15, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.