Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICDAR.2009.54
Title: Keyword spotting in document images through word shape coding
Authors: Bai, S.
Li, L. 
Tan, C.L. 
Issue Date: 2009
Source: Bai, S.,Li, L.,Tan, C.L. (2009). Keyword spotting in document images through word shape coding. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR : 331-335. ScholarBank@NUS Repository. https://doi.org/10.1109/ICDAR.2009.54
Abstract: With large databases of document images available, a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) on each document followed by indexing of the resulting text. However, if the quality of the document is poor or time is critical, complete OCR of all images is infeasible. This paper build upon previous works on Word Shape Coding to propose an alternative technique and combination of feature descriptors for keyword spotting without the use of OCR. Different sequence alignment similarity measures can be used for partial or whole word matching. The proposed technique is tolerant to serifs, font styles and certain degrees of touching, broken or overlapping characters. It improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching. Experiment results show that it is about 15 times faster than OCR. It is a promising technique to boost better document image retrieval. © 2009 IEEE.
Source Title: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
URI: http://scholarbank.nus.edu.sg/handle/10635/41717
ISBN: 9780769537252
ISSN: 15205363
DOI: 10.1109/ICDAR.2009.54
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

25
checked on Dec 13, 2017

Page view(s)

53
checked on Dec 9, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.