Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPR52729.2023.02204
DC FieldValue
dc.titleAre Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
dc.contributor.authorJi, W
dc.contributor.authorLiang, R
dc.contributor.authorZheng, Z
dc.contributor.authorZhang, W
dc.contributor.authorZhang, S
dc.contributor.authorLi, J
dc.contributor.authorLi, M
dc.contributor.authorChua, TS
dc.date.accessioned2023-11-15T06:05:51Z
dc.date.available2023-11-15T06:05:51Z
dc.date.issued2023-01-01
dc.identifier.citationJi, W, Liang, R, Zheng, Z, Zhang, W, Zhang, S, Li, J, Li, M, Chua, TS (2023-01-01). Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning 2023-June : 23013-23022. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPR52729.2023.02204
dc.identifier.isbn9798350301298
dc.identifier.issn1063-6919
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/245959
dc.description.abstractRecent research on video moment retrieval has mostly focused on enhancing the performance of accuracy, efficiency, and robustness, all of which largely rely on the abundance of high-quality annotations. While the precise frame-level annotations are time-consuming and cost-expensive, few attentions have been paid to the labeling process. In this work, we explore a new interactive manner to stimulate the process of human-in-the-loop annotation in video moment retrieval task. The key challenge is to select 'ambiguous' frames and videos for binary annotations to facilitate the network training. To be specific, we propose a new hierarchical uncertainty-based modeling that explicitly considers modeling the uncertainty of each frame within the entire video sequence corresponding to the query description, and selecting the frame with the highest uncertainty. Only selected frame will be annotated by the human experts, which can largely reduce the workload. After obtaining a small number of labels provided by the expert, we show that it is sufficient to learn a competitive video moment retrieval model in such a harsh environment. Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection. In general, our active learning strategy for video moment retrieval works not only at the frame level but also at the sequence level. Experiments on two public datasets validate the effectiveness of our proposed method. Our code is released at https://github.com/renjie-liang/HUAL.
dc.sourceElements
dc.typeConference Paper
dc.date.updated2023-11-11T05:02:18Z
dc.contributor.departmentDEPARTMENT OF COMPUTER SCIENCE
dc.description.doi10.1109/CVPR52729.2023.02204
dc.description.volume2023-June
dc.description.page23013-23022
dc.published.statePublished
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
CVPR23-Ji.pdfAccepted version3.64 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.