Please use this identifier to cite or link to this item:
https://doi.org/10.1109/CVPR52729.2023.02204
DC Field | Value | |
---|---|---|
dc.title | Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning | |
dc.contributor.author | Ji, W | |
dc.contributor.author | Liang, R | |
dc.contributor.author | Zheng, Z | |
dc.contributor.author | Zhang, W | |
dc.contributor.author | Zhang, S | |
dc.contributor.author | Li, J | |
dc.contributor.author | Li, M | |
dc.contributor.author | Chua, TS | |
dc.date.accessioned | 2023-11-15T06:05:51Z | |
dc.date.available | 2023-11-15T06:05:51Z | |
dc.date.issued | 2023-01-01 | |
dc.identifier.citation | Ji, W, Liang, R, Zheng, Z, Zhang, W, Zhang, S, Li, J, Li, M, Chua, TS (2023-01-01). Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning 2023-June : 23013-23022. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPR52729.2023.02204 | |
dc.identifier.isbn | 9798350301298 | |
dc.identifier.issn | 1063-6919 | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/245959 | |
dc.description.abstract | Recent research on video moment retrieval has mostly focused on enhancing the performance of accuracy, efficiency, and robustness, all of which largely rely on the abundance of high-quality annotations. While the precise frame-level annotations are time-consuming and cost-expensive, few attentions have been paid to the labeling process. In this work, we explore a new interactive manner to stimulate the process of human-in-the-loop annotation in video moment retrieval task. The key challenge is to select 'ambiguous' frames and videos for binary annotations to facilitate the network training. To be specific, we propose a new hierarchical uncertainty-based modeling that explicitly considers modeling the uncertainty of each frame within the entire video sequence corresponding to the query description, and selecting the frame with the highest uncertainty. Only selected frame will be annotated by the human experts, which can largely reduce the workload. After obtaining a small number of labels provided by the expert, we show that it is sufficient to learn a competitive video moment retrieval model in such a harsh environment. Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection. In general, our active learning strategy for video moment retrieval works not only at the frame level but also at the sequence level. Experiments on two public datasets validate the effectiveness of our proposed method. Our code is released at https://github.com/renjie-liang/HUAL. | |
dc.source | Elements | |
dc.type | Conference Paper | |
dc.date.updated | 2023-11-11T05:02:18Z | |
dc.contributor.department | DEPARTMENT OF COMPUTER SCIENCE | |
dc.description.doi | 10.1109/CVPR52729.2023.02204 | |
dc.description.volume | 2023-June | |
dc.description.page | 23013-23022 | |
dc.published.state | Published | |
Appears in Collections: | Staff Publications Elements |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
CVPR23-Ji.pdf | Accepted version | 3.64 MB | Adobe PDF | OPEN | None | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.