Please use this identifier to cite or link to this item:
Title: Cross-modal Moment Localization in Videos
Authors: Meng Liu
Xiang Wang 
Liqiang Nie
Qi Tian
Baoquan Chen
Tat-Seng Chua 
Keywords: Cross-modal Video Retrieval
Language-Temporal Attention
Moment Localization
Issue Date: 26-Oct-2018
Publisher: Association for Computing Machinery, Inc
Citation: Meng Liu, Xiang Wang, Liqiang Nie, Qi Tian, Baoquan Chen, Tat-Seng Chua (2018-10-26). Cross-modal Moment Localization in Videos. ACM Multimedia Conference 2018 : 843-851. ScholarBank@NUS Repository.
Abstract: In this paper, we address the temporal moment localization issue, namely, localizing a video moment described by a natural language query in an untrimmed video. This is a general yet challenging vision-language task since it requires not only the localization of moments, but also the multimodal comprehension of textual-temporal information (e.g., “first” and “leaving”) that helps to distinguish the desired moment from the others, especially those with the similar visual content. While existing studies treat a given language query as a single unit, we propose to decompose it into two components: the relevant cue related to the desired moment localization and the irrelevant one meaningless to the localization. This allows us to flexibly adapt to arbitrary queries in an end-to-end framework. In our proposed model, a language-temporal attention network is utilized to learn the word attention based on the temporal context information in the video. Therefore, our model can automatically select “what words to listen to” for localizing the desired moment. We evaluate the proposed model on two public benchmark datasets: DiDeMo and Charades-STA. The experimental results verify its superiority over several state-of-the-art methods. © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Source Title: ACM Multimedia Conference 2018
ISBN: 9781450356657
DOI: 10.1145/3240508.3240549
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Cross-modal Moment Localization in Videos.pdf6.63 MBAdobe PDF




checked on May 20, 2022

Page view(s)

checked on May 12, 2022


checked on May 12, 2022

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.