GEOGRAPHIC INFORMATION USE IN WEAKLY-SUPERVISED DEEP LEARNING FOR LANDMARK RECOGNITION | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICME.2017.8019376

Title:	GEOGRAPHIC INFORMATION USE IN WEAKLY-SUPERVISED DEEP LEARNING FOR LANDMARK RECOGNITION
Authors:	Yin, Yifang Liu, Zhenguang Zimmermann, Roger
Keywords:	Science & Technology Technology Computer Science, Software Engineering Computer Science, Theory & Methods Engineering, Electrical & Electronic Computer Science Engineering
Issue Date:	1-Jan-2017
Publisher:	IEEE
Citation:	Yin, Yifang, Liu, Zhenguang, Zimmermann, Roger (2017-01-01). GEOGRAPHIC INFORMATION USE IN WEAKLY-SUPERVISED DEEP LEARNING FOR LANDMARK RECOGNITION. IEEE International Conference on Multimedia and Expo (ICME) : 1015-1020. ScholarBank@NUS Repository. https://doi.org/10.1109/ICME.2017.8019376
Abstract:	The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the camera's field-of-view and the landmark's geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.
Source Title:	IEEE International Conference on Multimedia and Expo (ICME)
URI:	https://scholarbank.nus.edu.sg/handle/10635/200729
ISBN:	9781509060672
ISSN:	19457871 1945788X
DOI:	10.1109/ICME.2017.8019376
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
paper_306.pdf		523.32 kB	Adobe PDF	OPEN	Post-print	View/Download

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.