Please use this identifier to cite or link to this item: https://doi.org/10.1109/TMM.2012.2191944
Title: A generic framework for video annotation via semi-supervised learning
Authors: Zhang, T.
Xu, C.
Zhu, G. 
Liu, S.
Lu, H.
Keywords: Broadcast video
concave-convex procedure (CCCP)
event detection
graph
Internet
multiple instance learning
semi-supervised learning
web-casting text
Issue Date: 2012
Source: Zhang, T., Xu, C., Zhu, G., Liu, S., Lu, H. (2012). A generic framework for video annotation via semi-supervised learning. IEEE Transactions on Multimedia 14 (4 PART 2) : 1206-1219. ScholarBank@NUS Repository. https://doi.org/10.1109/TMM.2012.2191944
Abstract: Learning-based video annotation is essential for video analysis and understanding, and many various approaches have been proposed to avoid the intensive labor costs of purely manual annotation. However, there lacks a generic framework due to several difficulties, such as dependence of domain knowledge, insufficiency of training data, no precise localization and inefficacy for large-scale video dataset. In this paper, we propose a novel approach based on semi-supervised learning by means of information from the Internet for interesting event annotation in videos. Concretely, a Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm, which aims to simultaneously tackle these difficulties in a generic framework for various video domains (e.g., sports, news, and movies), is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the models. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e.g., movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e.g., YouTube, Google) in order to give more distributive information for event modeling. Two critical issues of FGSSMIL are: 1) how to calculate the weight assignment for a graph construction, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers; 2) how to solve the algorithm efficiently for large-scale dataset through an optimization approach. To address this issue, Concave-Convex Procedure (CCCP) and nonnegative multiplicative updating rule are adopted. We perform the extensive experiments in three popular video domains: movies, sports, and news. The results compared with the state-of-the-arts are promising and demonstrate the effectiveness and efficiency of our proposed approach. © 2012 IEEE.
Source Title: IEEE Transactions on Multimedia
URI: http://scholarbank.nus.edu.sg/handle/10635/54207
ISSN: 15209210
DOI: 10.1109/TMM.2012.2191944
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

28
checked on Dec 7, 2017

WEB OF SCIENCETM
Citations

17
checked on Nov 23, 2017

Page view(s)

30
checked on Dec 11, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.