Identification of Latin-based languages through character stroke categorization | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICDAR.2007.4378731

Title:	Identification of Latin-based languages through character stroke categorization
Authors:	Lu, S. Li, L. Chew, L.T.
Issue Date:	2007
Citation:	Lu, S.,Li, L.,Chew, L.T. (2007). Identification of Latin-based languages through character stroke categorization. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR 1 : 352-356. ScholarBank@NUS Repository. https://doi.org/10.1109/ICDAR.2007.4378731
Abstract:	This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the word shape coding, which converts each word image into a word shape code and accordingly transforms each document image into an electronic document vector. For each Latin-based language under study, a language template is first constructed through a corpus-based learning process. The underlying language of the query document is then determined based on the similarity between the query document vector and multiple constructed language templates. Compared with the reported methods, the proposed language identification technique is fast, accurate, and tolerant to text segmentation error caused by noise and various types of document degradation. Experimental results show some promising results. © 2007 IEEE.
Source Title:	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
URI:	http://scholarbank.nus.edu.sg/handle/10635/41054
ISBN:	0769528228
ISSN:	15205363
DOI:	10.1109/ICDAR.2007.4378731
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.