Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPRW53098.2021.00455
DC FieldValue
dc.titleConnecting Language and Vision for Natural Language-Based Vehicle Retrieval
dc.contributor.authorBai, Shuai
dc.contributor.authorZheng, Zhedong
dc.contributor.authorWang, Xiaohan
dc.contributor.authorLin, Junyang
dc.contributor.authorZhang, Zhu
dc.contributor.authorZhou, Chang
dc.contributor.authorYang, Hongxia
dc.contributor.authorYang, Yi
dc.date.accessioned2023-11-14T05:51:04Z
dc.date.available2023-11-14T05:51:04Z
dc.date.issued2021
dc.identifier.citationBai, Shuai, Zheng, Zhedong, Wang, Xiaohan, Lin, Junyang, Zhang, Zhu, Zhou, Chang, Yang, Hongxia, Yang, Yi (2021). Connecting Language and Vision for Natural Language-Based Vehicle Retrieval. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) : 4029-4038. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPRW53098.2021.00455
dc.identifier.isbn9781665448994
dc.identifier.issn2160-7508
dc.identifier.issn2160-7516
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/245933
dc.description.abstractVehicle search is one basic task for the efficient traffic management in terms of the AI City. Most existing prac-tices focus on the image-based vehicle matching, including vehicle re-identification and vehicle tracking. In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario. The natural language-based vehicle search poses one new challenge of fine-grained understanding of both vision and language modalities. To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model in an end-to-end manner. Except for the network structure design and the training strategy, several optimization objectives are also revisited in this work. The qualitative and quantitative experiments verify the effectiveness of the proposed method. Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy on the private test set. We hope this work can pave the way for the future study on using language description effectively and efficiently for real-world vehicle retrieval systems. The code will be available at https://github.com/ShuaiBai623/AIC2021-T5-CLV.
dc.publisherIEEE COMPUTER SOC
dc.sourceElements
dc.subjectScience & Technology
dc.subjectTechnology
dc.subjectComputer Science, Artificial Intelligence
dc.subjectComputer Science
dc.typeConference Paper
dc.date.updated2023-11-11T04:39:51Z
dc.contributor.departmentCIVIL AND ENVIRONMENTAL ENGINEERING
dc.contributor.departmentDEPARTMENT OF COMPUTER SCIENCE
dc.description.doi10.1109/CVPRW53098.2021.00455
dc.description.sourcetitleIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.description.page4029-4038
dc.published.statePublished
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
CVPRW2021_NLP_AICity.pdfAccepted version5.71 MBAdobe PDF

OPEN

Post-printView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.