Please use this identifier to cite or link to this item:
Title: Extraction of textual information from image for information retrieval
Authors: LI LINLIN
Keywords: Document Image Processing, Pattern Recognition
Issue Date: 25-May-2009
Citation: LI LINLIN (2009-05-25). Extraction of textual information from image for information retrieval. ScholarBank@NUS Repository.
Abstract: Traditional document image analysis relies on Optical Character Recognition (OCR) to obtain textual information from scanned documents. However, as the development of digitization technology, the current OCR technique is no longer sufficient for this purpose.With the increasing availability of high performance scanners, many projects have been initiated to digitalize paper-based materials in bulk and build large multilingual document image databases. Two inherent shortcomings, namely, language dependency and slow speed, are the main obstacles for current OCR to fully access the textual information of such databases. We address both problems for clean and degraded scanned document images respectively. In particular, a word shape coding method has been proposed, which is 20 times faster than OCR. This method has been successfully employed in language identification and document filtering for clean scanned document image archives. Furthermore, a holistic word spotting method, invariant to geometric transformations of translation, scale, and rotation, is proposed to facilitate fast retrieval for degraded scanned document images. This method is optimized for the U.S. patent database, which have many degraded document images with severe skew.The rapid development of camera technology has also challenged current OCR technique. The advancement of cameras has given people an alternative to traditional scanning for text image acquisition. However, because the image plane in a camera is not parallel to the document plane, camera-based images suffer from perspective distortion, leading to a failure when OCR or other textual information techniques are applied to them directly. In this thesis, this problem is addressed for camera-based document images and real scene images respectively. For camera-based document images, another word shape coding scheme, which is a variant of our holistic word spotting method, is proposed for language identification and fast retrieval. This method is Affine invariant, and thus is robust to moderate perspective deformation, which is sufficient for this image type. For real-scene images, which may have more severe perspective deformation, we propose a character recognition method based on a global descriptor called Cross Ratio Spectrum. With this descriptor, the perspective deformation of a character is compressed into a stretching deformation, and thus can be solved by Dynamic Time Warping. Besides characters, the method is also applicable to multi-component planar symbols.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Thesis.pdf5.81 MBAdobe PDF



Page view(s)

checked on Apr 18, 2019


checked on Apr 18, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.