Please use this identifier to cite or link to this item: https://doi.org/10.1142/S0218001404003137
Title: Chinese word searching in imaged documents
Authors: Lu, Y. 
Tan, C.L. 
Keywords: Character matching
Character segmentation
Chinese document image
Weighted Hausdorff distance
Word searching
Issue Date: 2004
Citation: Lu, Y., Tan, C.L. (2004). Chinese word searching in imaged documents. International Journal of Pattern Recognition and Artificial Intelligence 18 (2) : 229-246. ScholarBank@NUS Repository. https://doi.org/10.1142/S0218001404003137
Abstract: An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.
Source Title: International Journal of Pattern Recognition and Artificial Intelligence
URI: http://scholarbank.nus.edu.sg/handle/10635/39744
ISSN: 02180014
DOI: 10.1142/S0218001404003137
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.