Please use this identifier to cite or link to this item:
Title: Chinese word searching in imaged documents
Authors: Lu, Y. 
Tan, C.L. 
Keywords: Character matching
Character segmentation
Chinese document image
Weighted Hausdorff distance
Word searching
Issue Date: 2004
Citation: Lu, Y., Tan, C.L. (2004). Chinese word searching in imaged documents. International Journal of Pattern Recognition and Artificial Intelligence 18 (2) : 229-246. ScholarBank@NUS Repository.
Abstract: An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.
Source Title: International Journal of Pattern Recognition and Artificial Intelligence
ISSN: 02180014
DOI: 10.1142/S0218001404003137
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Jan 14, 2019


checked on Jan 14, 2019

Page view(s)

checked on Nov 17, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.