Semi-supervised clustering via learnt codeword distances | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.5244/C.22.90

Title:	Semi-supervised clustering via learnt codeword distances
Authors:	Batra D. Sukthankar R. Chen T.
Issue Date:	2008
Publisher:	British Machine Vision Association, BMVA
Citation:	Batra D., Sukthankar R., Chen T. (2008). Semi-supervised clustering via learnt codeword distances. BMVC 2008 - Proceedings of the British Machine Vision Conference 2008. ScholarBank@NUS Repository. https://doi.org/10.5244/C.22.90
Abstract:	This paper focuses on semi-supervised clustering, where the goal is to cluster a set of data-points given a set of similar/dissimilar examples. These examples provide instance-level equivalence/in-equivalence constraints (e.g., similar pairs belong to the same cluster while dissimilar pairs belong to different clusters), but in order to aid final clustering we must propagate them to feature-space level constraints (i.e., how similar are two regions in the feature space?). An increasingly popular approach to accomplish this is by learning distance metrics over the feature space that are guided by the instance-level constraints. Inspired by the success of recent bag-of-words models, we utilize codewords (or visual-words) as building blocks. Our proposed technique learns non-parametric distance metrics over codewords from these equivalence (and optionally, in-equivalence) constraints, which we are then able to propagate back to compute a dissimilarity measure between any two points in the feature space. There are two significant advances over previous work. First, unlike past efforts on global distance metric learning which try to transform the entire feature space so that similar pairs are close, we transform modes in data distribution or pockets of the feature space. This transformation is non-parametric and thus allows arbitrary non-linear deformations of the feature space. Second, while most Mahalanobis metrics are learnt using Semi-Definite Programming (SDP), our proposed solution is developed as a Linear Program (LP) and in practice, is extremely fast. Finally, we provide quantitative analysis on image datasets (MSRC, Corel) where ground-truth segmentation is available, and show that our learnt metrics can significantly improve clustering accuracy.
Source Title:	BMVC 2008 - Proceedings of the British Machine Vision Conference 2008
URI:	http://scholarbank.nus.edu.sg/handle/10635/146248
DOI:	10.5244/C.22.90
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.