Please use this identifier to cite or link to this item:
https://doi.org/10.1007/s10044-005-0009-3
Title: | A hybrid post-processing system for offline handwritten Chinese script recognition | Authors: | Li, Y.-X. Tan, C.L. Ding, X. |
Keywords: | Candidate confidence Candidate set size Chinese character recognition Contextual post-processing Perplexity Statistical language model |
Issue Date: | 2005 | Citation: | Li, Y.-X., Tan, C.L., Ding, X. (2005). A hybrid post-processing system for offline handwritten Chinese script recognition. Pattern Analysis and Applications 8 (3) : 272-286. ScholarBank@NUS Repository. https://doi.org/10.1007/s10044-005-0009-3 | Abstract: | In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing: statistical language models (LMs), candidate confidence, candidate set size, and search strategy. We then present a hybrid post-processing system, which integrates various kinds of information available. Next, we investigate seven LMs, four estimation methods of candidate confidence and different size of candidate set, and illustrate their influence on the performance of contextual post-processing in detail. Experimental results justify that the performance of the LMs are affected by training corpora size, smoothing method, and model pruning, and that lower perplexity correlates with a high accuracy. Comparing different estimation methods of candidate confidence shows that, it is vital to the contextual post-processing. We also show that allowing the correct characters to be captured in a limited number of candidates is extremely important for obtaining good post-processing performance. By adopting the hybrid post-processing, we can obtain high accuracy while paying attention to post-processing speed and memory space at the same time. It is shown that the average recognition accuracy of three Chinese scripts (about 66,000 characters in total) can reach 97.65%, which means 87% error correction rate in comparison with the 81.58% average accuracy before post-processing. In the end, we give some proposals for choosing a proper post-processing method for real script recognition tasks. | Source Title: | Pattern Analysis and Applications | URI: | http://scholarbank.nus.edu.sg/handle/10635/39248 | ISSN: | 14337541 | DOI: | 10.1007/s10044-005-0009-3 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.