Language identification in degraded and distorted document images | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41050

Title:	Language identification in degraded and distorted document images
Authors:	Lu, S. Tan, C.L. Huang, W.
Issue Date:	2006
Citation:	Lu, S.,Tan, C.L.,Huang, W. (2006). Language identification in degraded and distorted document images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3872 LNCS : 232-242. ScholarBank@NUS Repository.
Abstract:	This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. © Springer-Verlag Berlin Heidelberg 2006.
Source Title:	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
URI:	http://scholarbank.nus.edu.sg/handle/10635/41050
ISBN:	3540321403
ISSN:	03029743
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.