Please use this identifier to cite or link to this item:
|Title:||Document retrieval from compressed images||Authors:||Lu, Y.
Document image retrieval
Weighted Hausdorff distance
|Issue Date:||2002||Citation:||Lu, Y., Tan, C.L. (2002). Document retrieval from compressed images. Pattern Recognition 36 (4) : 987-996. ScholarBank@NUS Repository. https://doi.org/10.1016/S0031-3203(02)00127-9||Abstract:||With the emergence of digital libraries, more and more documents are stored and transmitted through the Internet in the format of compressed images. It is of significant meaning to develop a system which is capable of retrieving documents from these compressed document images. Aiming at the popular compression standard-CCITT Group 4 which is widely used for compressing document images, we present an approach to retrieve the documents from CCITT Group 4 compressed document images in this paper. The black and white changing elements are extracted directly from the compressed document images to act as the feature pixels, and the connected components are detected simultaneously. Then the word boxes are bounded based on the merging ofthe connected components. Weighted Hausdorff distance is proposed to assign all ofthe word objects from both the query document and the document from database to corresponding classes by an unsupervised classiffier, whereas the possible stop words are excluded. Document vectors are built by the occurrence frequency of the word object classes, and the pair-wise similarity of two document images is represented by the scalar product ofthe document vectors. Nine groups of articles pertaining to different domains are used to test the validity of the presented approach. Preliminary experimental results with the document images captured from students' theses show that the proposed approach has achieved a promising performance. © 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.||Source Title:||Pattern Recognition||URI:||http://scholarbank.nus.edu.sg/handle/10635/39008||ISSN:||00313203||DOI:||10.1016/S0031-3203(02)00127-9|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Apr 16, 2019
WEB OF SCIENCETM
checked on Apr 16, 2019
checked on Mar 31, 2019
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.