Please use this identifier to cite or link to this item:
Title: Human Visual Perception, study and applications to understanding Images and Videos
Keywords: vision, perception, eye-tracking, image/video analysis, semantics
Issue Date: 21-Dec-2011
Citation: HARISH KATTI (2011-12-21). Human Visual Perception, study and applications to understanding Images and Videos. ScholarBank@NUS Repository.
Abstract: Assessing whether a photograph is interesting, or spotting people in conversation or important objects in an images and videos, are visual tasks that we humans do effortlessly and in a robust manner. In this thesis I first explore and quantify how humans distinguish interesting photos from Flickr in a rapid time span (<100ms) and the visual properties used to make this decision. The role of global colour information in making these decisions is brought to light along with the minimum threshold of time required. Camera related Exchangeable image file format (EXIF) parameters are then used to realize a global scene-wide information based model to identify interesting images across meaningful categories such as indoor and outdoor urban and natural landscapes. My subsequent work focuses on how eye-movements are related to the eventual meaning derived from social and affective (emotion evoking) scenes. Such scenes pose significant challenges due to the abstract nature of visual cues (faces, interaction, affective objects) that influence eye-movements. Behavioural experiments involving eye-tracking are used to establish the consistency of preferential eye-fixations (attentional bias), allocated across different objects in such scenes. This data has been released as the publicly-available eye-fixation NUSEF dataset. Novel statistical measures have been proposed to infer attentional bias across concepts and also to analyse strong/weak relationships between visual elements in an image. The analysis uncovers consistent differences in attentional bias across subtle examples such as expressive/neutral faces and strong/weak relationships between visual elements in a scene. A new online clustering algorithm "binning" has also been developed to infer regions of interest from eye-movements for static and dynamic scenes. Applications of the attentional bias model and binning algorithm to challenging computer vision problems of foreground segmentation and key object detection in images is demonstrated. A human-in-loop interactive application involving dynamic placement of sub-title text in videos has also been explored in this thesis.The thesis also brings forth the influence of human visual perception on recall, precision and the notion of interest in some image and video analysis problems.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 



Page view(s)

checked on May 17, 2019


checked on May 17, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.