Please use this identifier to cite or link to this item:
https://doi.org/10.1145/3209978.3210003
DC Field | Value | |
---|---|---|
dc.title | Attentive Moment Retrieval in Videos | |
dc.contributor.author | Meng Liu | |
dc.contributor.author | Xiang Wang | |
dc.contributor.author | Liqiang Nie | |
dc.contributor.author | Xiangnan He | |
dc.contributor.author | Baoquan Chen | |
dc.contributor.author | Tat-Seng Chua | |
dc.date.accessioned | 2020-04-28T02:30:53Z | |
dc.date.available | 2020-04-28T02:30:53Z | |
dc.date.issued | 2018-07-12 | |
dc.identifier.citation | Meng Liu, Xiang Wang, Liqiang Nie, Xiangnan He, Baoquan Chen, Tat-Seng Chua (2018-07-12). Attentive Moment Retrieval in Videos. ACM SIGIR Conference on Information Retrieval 2018 : 15-24. ScholarBank@NUS Repository. https://doi.org/10.1145/3209978.3210003 | |
dc.identifier.isbn | 9781450356572 | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/167297 | |
dc.description.abstract | In the past few years, language-based video retrieval has attracted a lot of attention. However, as a natural extension, localizing the specific video moments within a video given a description query is seldom explored. Although these two tasks look similar, the latter is more challenging due to two main reasons: 1) The former task only needs to judge whether the query occurs in a video and returns an entire video, but the latter is expected to judge which moment within a video matches the query and accurately returns the start and end points of the moment. Due to the fact that different moments in a video have varying durations and diverse spatial-temporal characteristics, uncovering the underlying moments is highly challenging. 2) As for the key component of relevance estimation, the former usually embeds a video and the query into a common space to compute the relevance score. However, the later task concerns moment localization where not only the features of a specific moment matter, but the context information of the moment also contributes a lot. For example, the query may contain temporal constraint words, such as "first'', therefore need temporal context to properly comprehend them. To address these issues, we develop an Attentive Cross-Modal Retrieval Network. In particular, we design a memory attention mechanism to emphasize the visual features mentioned in the query and simultaneously incorporate their context. In the light of this, we obtain the augmented moment representation. Meanwhile, a cross-modal fusion sub-network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. We evaluate our method on two datasets: DiDeMo and TACoS. Extensive experiments show the effectiveness of our model as compared to the state-of-the-art methods. © 2018 ACM. | |
dc.publisher | Association for Computing Machinery, Inc | |
dc.subject | Cross-modal retrieval | |
dc.subject | Moment localization | |
dc.subject | Temporal memory attention | |
dc.subject | Tensor fusion | |
dc.type | Conference Paper | |
dc.contributor.department | DEPARTMENT OF COMPUTER SCIENCE | |
dc.description.doi | 10.1145/3209978.3210003 | |
dc.description.sourcetitle | ACM SIGIR Conference on Information Retrieval 2018 | |
dc.description.page | 15-24 | |
dc.published.state | Published | |
dc.grant.id | R-252-300-002-490 | |
dc.grant.fundingagency | Infocomm Media Development Authority | |
dc.grant.fundingagency | National Research Foundation | |
Appears in Collections: | Elements Staff Publications |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
Attentive Moment Retrieval in Videos.pdf | 1.65 MB | Adobe PDF | OPEN | Published | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.