Please use this identifier to cite or link to this item:
Title: Multioriented video scene text detection through bayesian classification and boundary growing
Authors: Shivakumara, P. 
Sreedhar, R.P.
Phan, T.Q. 
Lu, S.
Tan, C.L. 
Keywords: Bayesian classifier
boundary growing
Laplacian-Sobel product (LSP)
maximum gradient difference
multioriented video scene text detection
text candidate detection
Issue Date: 2012
Citation: Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L. (2012). Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology 22 (8) : 1227-1235. ScholarBank@NUS Repository.
Abstract: Multioriented text detection in video frames is not as easy as detection of captions or graphics or overlaid texts, which usually appears in the horizontal direction and has high contrast compared to its background. Multioriented text generally refers to scene text that makes text detection more challenging and interesting due to unfavorable characteristics of scene text. Therefore, conventional text detection methods may not give good results for multioriented scene text detection. Hence, in this paper, we present a new enhancement method that includes the product of Laplacian and Sobel operations to enhance text pixels in videos. To classify true text pixels, we propose a Bayesian classifier without assuming a priori probability about the input frame but estimating it based on three probable matrices. Three different ways of clustering are performed on the output of the enhancement method to obtain the three probable matrices. Text candidates are obtained by intersecting the output of the Bayesian classifier with the Canny edge map of the input frame. A boundary growing method is introduced to traverse the multioriented scene text lines using text candidates. The boundary growing method works based on the concept of nearest neighbors. The robustness of the method has been tested on a variety of datasets that include our own created data (nonhorizontal and horizontal text data) and two publicly available data, namely, video frames of Hua and complex scene text data of ICDAR 2003 competition (camera images). Experimental results show that the performance of the proposed method is encouraging compared with results of existing methods in terms of recall, precision, F-measures, and computational times. © 2012 IEEE.
Source Title: IEEE Transactions on Circuits and Systems for Video Technology
ISSN: 10518215
DOI: 10.1109/TCSVT.2012.2198129
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Mar 18, 2019


checked on Mar 18, 2019

Page view(s)

checked on Mar 17, 2019

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.