Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing

Please use this identifier to cite or link to this item: https://doi.org/10.1145/2964284.2964308

DC Field	Value
dc.title	Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing
dc.contributor.author	Hanwang Zhang
dc.contributor.author	Meng Wang
dc.contributor.author	Richang Hong
dc.contributor.author	Tat-Seng Chua
dc.date.accessioned	2020-04-28T02:30:00Z
dc.date.available	2020-04-28T02:30:00Z
dc.date.issued	2016-10-15
dc.identifier.citation	Hanwang Zhang, Meng Wang, Richang Hong, Tat-Seng Chua (2016-10-15). Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing. ACM Multimedia Conference 2016 : 781-790. ScholarBank@NUS Repository. https://doi.org/10.1145/2964284.2964308
dc.identifier.isbn	9781450336031
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/167291
dc.description.abstract	We focus on hashing videos into short binary codes for efficient Content-based Video Retrieval (CBVR), which is a fundamental technique that supports access to the evergrowing abundance of videos on the Web. Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework called Self-Supervised Temporal Hashing (SSTH) that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. Specifically, the hash function of SSTH is an encoder RNN equipped with the proposed Binary LSTM (BLSTM) that generates binary codes for videos. The hash function is learned in a self-supervised fashion, where a decoder RNN is proposed to reconstruct the original video frames in both forward and reverse orders. For binary code optimization, we develop a backpropagation rule that tackles the non-differentiability of BLSTM. This rule allows efficient deep network training without suffering from the binarization loss. Through extensive CBVR experiments on two real-world consumer video datasets of Youtube and Flickr, we show that SSTH consistently outperforms state-of-theart video hashing methods, e.g., in terms of mAP@20, SSTH using only 128 bits can still outperform others using 256 bits by at least 9% to 15% on both datasets. © 2016 ACM.
dc.publisher	Association for Computing Machinery, Inc
dc.subject	Binary LSTM
dc.subject	Sequence learning
dc.subject	Temporal hashing
dc.subject	Video retrieval
dc.type	Conference Paper
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1145/2964284.2964308
dc.description.sourcetitle	ACM Multimedia Conference 2016
dc.description.page	781-790
dc.grant.id	R-252-300-002-490
dc.grant.fundingagency	Infocomm Media Development Authority
dc.grant.fundingagency	National Research Foundation
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
Play and Rewind.pdf		1.64 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM