Please use this identifier to cite or link to this item:
Title: Beyond Visual Words: Exploring Higher - Level Image Representation For Object Categorization
Keywords: visual synset, image representation, object categorization
Issue Date: 16-Oct-2009
Citation: ZHENG YANTAO (2009-10-16). Beyond Visual Words: Exploring Higher - Level Image Representation For Object Categorization. ScholarBank@NUS Repository.
Abstract: Category-level object recognition is an important but challenging research task. The diverse and open-ended nature of object appearance makes objects, no matter from the same category or otherwise, possess boundless variation in visual looks and shapes. Such visual diversity leads to a huge gap between visual appearance of images and their semantic content. This thesis aims to tackle the issues of visual diversity for better object categorization, from two aspects: visual representation and learning scheme.One contribution of the thesis is in devising a higher-level visual representation, visual synset. Visual synset is built on top of traditional bag of words representation. It incorporates the co-occurring and spatial scatter information of visual words to make it more descriptive to discriminate images of different categories. Moreover, visual synset leverages the "probabilistic semantics" of visual words, i.e. their class probability distributions, to group ones with similar distribution into one visual content unit. In this way, visual synset can partially bridge the visual differences of images of same class and leads to a more coherent image distribution in the feature space.The second contribution of the thesis is in developing a generative learning model that goes beyond image appearances. By taking a Bayesian perspective,we interpret visual diversity as a probabilistic generative phenomenon, in which the visual appearance arises from the countably infinitely many common appearance patterns. To make a valid learning model for this generative interpretation, three issues must be tackled: (1) there exist countably infinitely many appearance patterns, as the objects have limitless variation of appearance; (2) the appearance patterns are shared not only within but also across object categories, as the objects of different categories can be visually similar too; and (3) intuitively, the objects within a category should share a closer set of appearance patterns than those of different categories. To tackle these three issues, we propose a generative probabilistic model, nested hierarchical Dirichlet process (HDP) mixture. The stick breaking construction process in the nested HDP mixture provides the possibility of countably infinitely many appearance patterns that can grow, shrink and change freely. The hierarchical structure of our model not only enables the appearance patterns to be shared across object categories, but also allows the images within a category to arise from a closer appearance pattern set than those of different categories.Experiments on Caltech-101 and NUS-WIDE-object dataset demonstrate that the proposed visual representation, visual synset, and learning scheme, nested HDP mixture, in the thesis can deliver promising performance and outperform existing models with significant margins.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ZhengYT.pdf9.4 MBAdobe PDF



Page view(s)

checked on Apr 12, 2019


checked on Apr 12, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.