Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPRW.2009.5206721
DC FieldValue
dc.titleHierarchical spatio-temporal context modeling for action recognition
dc.contributor.authorSun, J.
dc.contributor.authorWu, X.
dc.contributor.authorYan, S.
dc.contributor.authorCheong, L.-F.
dc.contributor.authorChua, T.-S.
dc.contributor.authorLi, J.
dc.date.accessioned2013-07-23T09:30:38Z
dc.date.available2013-07-23T09:30:38Z
dc.date.issued2009
dc.identifier.citationSun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009 : 2004-2011. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPRW.2009.5206721
dc.identifier.isbn9781424439935
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/43310
dc.description.abstractThe problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatiotemporal context. In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context (SIFT average descriptor), 2) intra-trajectory context (trajectory transition descriptor), and 3) inter-trajectory context (trajectory proximity descriptor). To obtain efficient and compact representations for the latter two levels, we encode the spatiotemporal context information into the transition matrix of a Markov process, and then extract its stationary distribution as the final context descriptor. Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance improvements over the state-ofthe- art results, respectively. We further propose to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation. ©2009 IEEE.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/CVPRW.2009.5206721
dc.sourceScopus
dc.typeConference Paper
dc.contributor.departmentINTERACTIVE & DIGITAL MEDIA INSTITUTE
dc.contributor.departmentELECTRICAL & COMPUTER ENGINEERING
dc.contributor.departmentCOMPUTATIONAL SCIENCE
dc.description.doi10.1109/CVPRW.2009.5206721
dc.description.sourcetitle2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009
dc.description.page2004-2011
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.