Please use this identifier to cite or link to this item: https://doi.org/10.1109/TCYB.2013.2276433
Title: Multilevel depth and image fusion for human activity detection
Authors: Ni, B.
Pei, Y.
Moulin, P.
Yan, S. 
Keywords: Action recognition and localization
Depth sensor
Spatial and temporal context
Issue Date: Oct-2013
Citation: Ni, B., Pei, Y., Moulin, P., Yan, S. (2013-10). Multilevel depth and image fusion for human activity detection. IEEE Transactions on Cybernetics 43 (5) : 1382-1394. ScholarBank@NUS Repository. https://doi.org/10.1109/TCYB.2013.2276433
Abstract: Recognizing complex human activities usually requires the detection and modeling of individual visual features and the interactions between them. Current methods only rely on the visual features extracted from 2-D images, and therefore often lead to unreliable salient visual feature detection and inaccurate modeling of the interaction context between individual features. In this paper, we show that these problems can be addressed by combining data from a conventional camera and a depth sensor (e.g., Microsoft Kinect). We propose a novel complex activity recognition and localization framework that effectively fuses information from both grayscale and depth image channels at multiple levels of the video processing pipeline. In the individual visual feature detection level, depth-based filters are applied to the detected human/object rectangles to remove false detections. In the next level of interaction modeling, 3-D spatial and temporal contexts among human subjects or objects are extracted by integrating information from both grayscale and depth images. Depth information is also utilized to distinguish different types of indoor scenes. Finally, a latent structural model is developed to integrate the information from multiple levels of video processing for an activity detection. Extensive experiments on two activity recognition benchmarks (one with depth information) and a challenging grayscale + depth human activity database that contains complex interactions between human-human, human-object, and human-surroundings demonstrate the effectiveness of the proposed multilevel grayscale + depth fusion scheme. Higher recognition and localization accuracies are obtained relative to the previous methods. © 2013 IEEE.
Source Title: IEEE Transactions on Cybernetics
URI: http://scholarbank.nus.edu.sg/handle/10635/56712
ISSN: 21682267
DOI: 10.1109/TCYB.2013.2276433
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.