Hierarchical spatio-temporal context modeling for action recognition

Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPRW.2009.5206721

DC Field	Value
dc.title	Hierarchical spatio-temporal context modeling for action recognition
dc.contributor.author	Sun, J.
dc.contributor.author	Wu, X.
dc.contributor.author	Yan, S.
dc.contributor.author	Cheong, L.-F.
dc.contributor.author	Chua, T.-S.
dc.contributor.author	Li, J.
dc.date.accessioned	2013-07-23T09:30:38Z
dc.date.available	2013-07-23T09:30:38Z
dc.date.issued	2009
dc.identifier.citation	Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009 : 2004-2011. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPRW.2009.5206721
dc.identifier.isbn	9781424439935
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/43310
dc.description.abstract	The problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatiotemporal context. In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context (SIFT average descriptor), 2) intra-trajectory context (trajectory transition descriptor), and 3) inter-trajectory context (trajectory proximity descriptor). To obtain efficient and compact representations for the latter two levels, we encode the spatiotemporal context information into the transition matrix of a Markov process, and then extract its stationary distribution as the final context descriptor. Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance improvements over the state-ofthe- art results, respectively. We further propose to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation. ©2009 IEEE.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/CVPRW.2009.5206721
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	INTERACTIVE & DIGITAL MEDIA INSTITUTE
dc.contributor.department	ELECTRICAL & COMPUTER ENGINEERING
dc.contributor.department	COMPUTATIONAL SCIENCE
dc.description.doi	10.1109/CVPRW.2009.5206721
dc.description.sourcetitle	2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009
dc.description.page	2004-2011
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM