Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/TCSVT.2010.2087570

Title:	Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance
Authors:	Liu, X. Lin, L. Yan, S. Jin, H. Tao, W.
Keywords:	Active feature deformable template object recognition spatio-temporal context
Issue Date:	Apr-2011
Citation:	Liu, X., Lin, L., Yan, S., Jin, H., Tao, W. (2011-04). Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance. IEEE Transactions on Circuits and Systems for Video Technology 21 (4) : 393-407. ScholarBank@NUS Repository. https://doi.org/10.1109/TCSVT.2010.2087570
Abstract:	We present in this paper an integrated solution to rapidly recognizing dynamic objects in surveillance videos by exploring various contextual information. This solution consists of three components. The first one is a multi-view object representation. It contains a set of deformable object templates, each of which comprises an ensemble of active features for an object category in a specific view/pose. The template can be efficiently learned via a small set of roughly aligned positive samples without negative samples. The second component is a unified spatio-temporal context model, which integrates two types of contextual information in a Bayesian way. One is the spatial context, including main surface property (constraints on object type and density) and camera geometric parameters (constraints on object size at a specific location). The other is the temporal context, containing the pixel-level and instance-level consistency models, used to generate the foreground probability map and local object trajectory prediction. We also combine the above spatial and temporal contextual information to estimate the object pose in scene and use it as a strong prior for inference. The third component is a robust sampling-based inference procedure. Taking the spatio-temporal contextual knowledge as the prior model and deformable template matching as the likelihood model, we formulate the problem of object category recognition as a maximum-a-posteriori problem. The probabilistic inference can be achieved by a simple Markov chain Mento Carlo sampler, owing to the informative spatio-temporal context model which is able to greatly reduce the computation complexity and the category ambiguities. The system performance and benefit gain from the spatio-temporal contextual information are quantitatively evaluated on several challenging datasets and the comparison results clearly demonstrate that our proposed algorithm outperforms other state-of-the-art algorithms. © 2006 IEEE.
Source Title:	IEEE Transactions on Circuits and Systems for Video Technology
URI:	http://scholarbank.nus.edu.sg/handle/10635/56351
ISSN:	10518215
DOI:	10.1109/TCSVT.2010.2087570
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.