Please use this identifier to cite or link to this item: https://doi.org/10.1016/S0031-3203(02)00127-9
DC FieldValue
dc.titleDocument retrieval from compressed images
dc.contributor.authorLu, Y.
dc.contributor.authorTan, C.L.
dc.date.accessioned2013-07-04T07:31:52Z
dc.date.available2013-07-04T07:31:52Z
dc.date.issued2002
dc.identifier.citationLu, Y., Tan, C.L. (2002). Document retrieval from compressed images. Pattern Recognition 36 (4) : 987-996. ScholarBank@NUS Repository. https://doi.org/10.1016/S0031-3203(02)00127-9
dc.identifier.issn00313203
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/39008
dc.description.abstractWith the emergence of digital libraries, more and more documents are stored and transmitted through the Internet in the format of compressed images. It is of significant meaning to develop a system which is capable of retrieving documents from these compressed document images. Aiming at the popular compression standard-CCITT Group 4 which is widely used for compressing document images, we present an approach to retrieve the documents from CCITT Group 4 compressed document images in this paper. The black and white changing elements are extracted directly from the compressed document images to act as the feature pixels, and the connected components are detected simultaneously. Then the word boxes are bounded based on the merging ofthe connected components. Weighted Hausdorff distance is proposed to assign all ofthe word objects from both the query document and the document from database to corresponding classes by an unsupervised classiffier, whereas the possible stop words are excluded. Document vectors are built by the occurrence frequency of the word object classes, and the pair-wise similarity of two document images is represented by the scalar product ofthe document vectors. Nine groups of articles pertaining to different domains are used to test the validity of the presented approach. Preliminary experimental results with the document images captured from students' theses show that the proposed approach has achieved a promising performance. © 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1016/S0031-3203(02)00127-9
dc.sourceScopus
dc.subjectCompressed image
dc.subjectDocument image retrieval
dc.subjectDocument similarity
dc.subjectObject matching
dc.subjectWeighted Hausdorff distance
dc.typeArticle
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.doi10.1016/S0031-3203(02)00127-9
dc.description.sourcetitlePattern Recognition
dc.description.volume36
dc.description.issue4
dc.description.page987-996
dc.description.codenPTNRA
dc.identifier.isiut000180577600012
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.