Exploring probabilistic localized video representation for human action recognition

Please use this identifier to cite or link to this item: https://doi.org/10.1007/s11042-011-0748-7

DC Field	Value
dc.title	Exploring probabilistic localized video representation for human action recognition
dc.contributor.author	Song, Y.
dc.contributor.author	Tang, S.
dc.contributor.author	Zheng, Y.-T.
dc.contributor.author	Chua, T.-S.
dc.contributor.author	Zhang, Y.
dc.contributor.author	Lin, S.
dc.date.accessioned	2014-07-04T03:09:37Z
dc.date.available	2014-07-04T03:09:37Z
dc.date.issued	2012
dc.identifier.citation	Song, Y., Tang, S., Zheng, Y.-T., Chua, T.-S., Zhang, Y., Lin, S. (2012). Exploring probabilistic localized video representation for human action recognition. Multimedia Tools and Applications 58 (3) : 663-685. ScholarBank@NUS Repository. https://doi.org/10.1007/s11042-011-0748-7
dc.identifier.issn	15737721
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/77858
dc.description.abstract	In recent years, the bag-of-words (BoW) video representations have achieved promising results in human action recognition in videos. By vector quantizing local spatial temporal (ST) features, the BoW video representation brings in simplicity and efficiency, but limitations too. First, the discretization of feature space in BoW inevitably results in ambiguity and information loss in video representation. Second, there exists no universal codebook for BoW representation. The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. Furthermore, the probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable to most discriminative classifiers, such as the nearest neighbor schemes and the kernel based classifiers. Experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results. © 2011 Springer Science+Business Media, LLC.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1007/s11042-011-0748-7
dc.source	Scopus
dc.subject	Human action recognition
dc.subject	Information-theoretic video matching
dc.subject	Probabilistic video representation
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1007/s11042-011-0748-7
dc.description.sourcetitle	Multimedia Tools and Applications
dc.description.volume	58
dc.description.issue	3
dc.description.page	663-685
dc.description.coden	MTAPF
dc.identifier.isiut	000303507900010
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM