Please use this identifier to cite or link to this item:
https://doi.org/10.1145/3539618.3591712
Title: | Learnable Pillar-based Re-ranking for Image-Text Retrieval | Authors: | Qu, L Liu, M Wang, W Zheng, Z Nie, L Chua, TS |
Issue Date: | 19-Jul-2023 | Publisher: | ACM | Citation: | Qu, L, Liu, M, Wang, W, Zheng, Z, Nie, L, Chua, TS (2023-07-19). Learnable Pillar-based Re-ranking for Image-Text Retrieval. SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval : 1252-1261. ScholarBank@NUS Repository. https://doi.org/10.1145/3539618.3591712 | Abstract: | Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Prior work usually focuses on the pairwise relations (i.e., whether a data sample matches another) but ignores the higher-order neighbor relations (i.e., a matching structure among multiple data samples). Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. However, it is ineffective to directly extend existing re-ranking algorithms to image-text retrieval. In this paper, we analyze the reason from four perspectives, i.e., generalization, flexibility, sparsity, and asymmetry, and propose a novel learnable pillar-based re-ranking paradigm. Concretely, we first select top-ranked intra- and inter-modal neighbors as pillars, and then reconstruct data samples with the neighbor relations between them and the pillars. In this way, each sample can be mapped into a multimodal pillar space only using similarities, ensuring generalization. After that, we design a neighbor-aware graph reasoning module to flexibly exploit the relations and excavate the sparse positive items within a neighborhood. We also present a structure alignment constraint to promote cross-modal collaboration and align the asymmetric modalities. On top of various base backbones, we carry out extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, demonstrating the effectiveness, superiority, generalization, and transferability of our proposed re-ranking paradigm. | Source Title: | SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval | URI: | https://scholarbank.nus.edu.sg/handle/10635/245935 | ISBN: | 9781450394086 | DOI: | 10.1145/3539618.3591712 |
Appears in Collections: | Elements Staff Publications |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
SIGIR23-Qu.pdf | Accepted version | 1.76 MB | Adobe PDF | OPEN | Post-print | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.