Please use this identifier to cite or link to this item: https://doi.org/10.1145/3539618.3591712
Title: Learnable Pillar-based Re-ranking for Image-Text Retrieval
Authors: Qu, L
Liu, M
Wang, W 
Zheng, Z 
Nie, L 
Chua, TS 
Issue Date: 19-Jul-2023
Publisher: ACM
Citation: Qu, L, Liu, M, Wang, W, Zheng, Z, Nie, L, Chua, TS (2023-07-19). Learnable Pillar-based Re-ranking for Image-Text Retrieval. SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval : 1252-1261. ScholarBank@NUS Repository. https://doi.org/10.1145/3539618.3591712
Abstract: Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Prior work usually focuses on the pairwise relations (i.e., whether a data sample matches another) but ignores the higher-order neighbor relations (i.e., a matching structure among multiple data samples). Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. However, it is ineffective to directly extend existing re-ranking algorithms to image-text retrieval. In this paper, we analyze the reason from four perspectives, i.e., generalization, flexibility, sparsity, and asymmetry, and propose a novel learnable pillar-based re-ranking paradigm. Concretely, we first select top-ranked intra- and inter-modal neighbors as pillars, and then reconstruct data samples with the neighbor relations between them and the pillars. In this way, each sample can be mapped into a multimodal pillar space only using similarities, ensuring generalization. After that, we design a neighbor-aware graph reasoning module to flexibly exploit the relations and excavate the sparse positive items within a neighborhood. We also present a structure alignment constraint to promote cross-modal collaboration and align the asymmetric modalities. On top of various base backbones, we carry out extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, demonstrating the effectiveness, superiority, generalization, and transferability of our proposed re-ranking paradigm.
Source Title: SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
URI: https://scholarbank.nus.edu.sg/handle/10635/245935
ISBN: 9781450394086
DOI: 10.1145/3539618.3591712
Appears in Collections:Elements
Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
SIGIR23-Qu.pdfAccepted version1.76 MBAdobe PDF

OPEN

Post-printView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.