Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning

Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPR52729.2023.02204

DC Field	Value
dc.title	Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
dc.contributor.author	Ji, W
dc.contributor.author	Liang, R
dc.contributor.author	Zheng, Z
dc.contributor.author	Zhang, W
dc.contributor.author	Zhang, S
dc.contributor.author	Li, J
dc.contributor.author	Li, M
dc.contributor.author	Chua, TS
dc.date.accessioned	2023-11-15T06:05:51Z
dc.date.available	2023-11-15T06:05:51Z
dc.date.issued	2023-01-01
dc.identifier.citation	Ji, W, Liang, R, Zheng, Z, Zhang, W, Zhang, S, Li, J, Li, M, Chua, TS (2023-01-01). Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning 2023-June : 23013-23022. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPR52729.2023.02204
dc.identifier.isbn	9798350301298
dc.identifier.issn	1063-6919
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/245959
dc.description.abstract	Recent research on video moment retrieval has mostly focused on enhancing the performance of accuracy, efficiency, and robustness, all of which largely rely on the abundance of high-quality annotations. While the precise frame-level annotations are time-consuming and cost-expensive, few attentions have been paid to the labeling process. In this work, we explore a new interactive manner to stimulate the process of human-in-the-loop annotation in video moment retrieval task. The key challenge is to select 'ambiguous' frames and videos for binary annotations to facilitate the network training. To be specific, we propose a new hierarchical uncertainty-based modeling that explicitly considers modeling the uncertainty of each frame within the entire video sequence corresponding to the query description, and selecting the frame with the highest uncertainty. Only selected frame will be annotated by the human experts, which can largely reduce the workload. After obtaining a small number of labels provided by the expert, we show that it is sufficient to learn a competitive video moment retrieval model in such a harsh environment. Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection. In general, our active learning strategy for video moment retrieval works not only at the frame level but also at the sequence level. Experiments on two public datasets validate the effectiveness of our proposed method. Our code is released at https://github.com/renjie-liang/HUAL.
dc.source	Elements
dc.type	Conference Paper
dc.date.updated	2023-11-11T05:02:18Z
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1109/CVPR52729.2023.02204
dc.description.volume	2023-June
dc.description.page	23013-23022
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
CVPR23-Ji.pdf	Accepted version	3.64 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM