Please use this identifier to cite or link to this item: https://doi.org/10.1145/2502081.2502155
Title: Spatio-temporal fisher vector coding for surveillance event detection
Authors: Chen, Q.
Cai, Y.
Brown, L.
Datta, A.
Fan, Q.
Feris, R.
Yan, S. 
Hauptmann, A.
Pankanti, S.
Keywords: Feature coding
System
Video event detection
Issue Date: 2013
Source: Chen, Q.,Cai, Y.,Brown, L.,Datta, A.,Fan, Q.,Feris, R.,Yan, S.,Hauptmann, A.,Pankanti, S. (2013). Spatio-temporal fisher vector coding for surveillance event detection. MM 2013 - Proceedings of the 2013 ACM Multimedia Conference : 589-592. ScholarBank@NUS Repository. https://doi.org/10.1145/2502081.2502155
Abstract: We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatiotemporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatiotemporal descriptors, called MoSIFT and generated by pairwise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [12] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types. Copyright © 2013 ACM.
Source Title: MM 2013 - Proceedings of the 2013 ACM Multimedia Conference
URI: http://scholarbank.nus.edu.sg/handle/10635/84210
ISBN: 9781450324045
DOI: 10.1145/2502081.2502155
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

8
checked on Feb 21, 2018

Page view(s)

38
checked on Feb 22, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.