Please use this identifier to cite or link to this item: https://doi.org/10.1016/S0031-3203(02)00127-9
Title: Document retrieval from compressed images
Authors: Lu, Y. 
Tan, C.L. 
Keywords: Compressed image
Document image retrieval
Document similarity
Object matching
Weighted Hausdorff distance
Issue Date: 2002
Source: Lu, Y., Tan, C.L. (2002). Document retrieval from compressed images. Pattern Recognition 36 (4) : 987-996. ScholarBank@NUS Repository. https://doi.org/10.1016/S0031-3203(02)00127-9
Abstract: With the emergence of digital libraries, more and more documents are stored and transmitted through the Internet in the format of compressed images. It is of significant meaning to develop a system which is capable of retrieving documents from these compressed document images. Aiming at the popular compression standard-CCITT Group 4 which is widely used for compressing document images, we present an approach to retrieve the documents from CCITT Group 4 compressed document images in this paper. The black and white changing elements are extracted directly from the compressed document images to act as the feature pixels, and the connected components are detected simultaneously. Then the word boxes are bounded based on the merging ofthe connected components. Weighted Hausdorff distance is proposed to assign all ofthe word objects from both the query document and the document from database to corresponding classes by an unsupervised classiffier, whereas the possible stop words are excluded. Document vectors are built by the occurrence frequency of the word object classes, and the pair-wise similarity of two document images is represented by the scalar product ofthe document vectors. Nine groups of articles pertaining to different domains are used to test the validity of the presented approach. Preliminary experimental results with the document images captured from students' theses show that the proposed approach has achieved a promising performance. © 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
Source Title: Pattern Recognition
URI: http://scholarbank.nus.edu.sg/handle/10635/39008
ISSN: 00313203
DOI: 10.1016/S0031-3203(02)00127-9
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

17
checked on Dec 6, 2017

WEB OF SCIENCETM
Citations

15
checked on Nov 22, 2017

Page view(s)

42
checked on Dec 10, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.