Please use this identifier to cite or link to this item:
https://doi.org/10.1142/S0218001404003137
Title: | Chinese word searching in imaged documents | Authors: | Lu, Y. Tan, C.L. |
Keywords: | Character matching Character segmentation Chinese document image Weighted Hausdorff distance Word searching |
Issue Date: | 2004 | Citation: | Lu, Y., Tan, C.L. (2004). Chinese word searching in imaged documents. International Journal of Pattern Recognition and Artificial Intelligence 18 (2) : 229-246. ScholarBank@NUS Repository. https://doi.org/10.1142/S0218001404003137 | Abstract: | An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image. | Source Title: | International Journal of Pattern Recognition and Artificial Intelligence | URI: | http://scholarbank.nus.edu.sg/handle/10635/39744 | ISSN: | 02180014 | DOI: | 10.1142/S0218001404003137 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.