Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Please use this identifier to cite or link to this item: https://doi.org/10.1109/CVPRW53098.2021.00455

DC Field	Value
dc.title	Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
dc.contributor.author	Bai, Shuai
dc.contributor.author	Zheng, Zhedong
dc.contributor.author	Wang, Xiaohan
dc.contributor.author	Lin, Junyang
dc.contributor.author	Zhang, Zhu
dc.contributor.author	Zhou, Chang
dc.contributor.author	Yang, Hongxia
dc.contributor.author	Yang, Yi
dc.date.accessioned	2023-11-14T05:51:04Z
dc.date.available	2023-11-14T05:51:04Z
dc.date.issued	2021
dc.identifier.citation	Bai, Shuai, Zheng, Zhedong, Wang, Xiaohan, Lin, Junyang, Zhang, Zhu, Zhou, Chang, Yang, Hongxia, Yang, Yi (2021). Connecting Language and Vision for Natural Language-Based Vehicle Retrieval. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) : 4029-4038. ScholarBank@NUS Repository. https://doi.org/10.1109/CVPRW53098.2021.00455
dc.identifier.isbn	9781665448994
dc.identifier.issn	2160-7508
dc.identifier.issn	2160-7516
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/245933
dc.description.abstract	Vehicle search is one basic task for the efficient traffic management in terms of the AI City. Most existing prac-tices focus on the image-based vehicle matching, including vehicle re-identification and vehicle tracking. In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario. The natural language-based vehicle search poses one new challenge of fine-grained understanding of both vision and language modalities. To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model in an end-to-end manner. Except for the network structure design and the training strategy, several optimization objectives are also revisited in this work. The qualitative and quantitative experiments verify the effectiveness of the proposed method. Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy on the private test set. We hope this work can pave the way for the future study on using language description effectively and efficiently for real-world vehicle retrieval systems. The code will be available at https://github.com/ShuaiBai623/AIC2021-T5-CLV.
dc.publisher	IEEE COMPUTER SOC
dc.source	Elements
dc.subject	Science & Technology
dc.subject	Technology
dc.subject	Computer Science, Artificial Intelligence
dc.subject	Computer Science
dc.type	Conference Paper
dc.date.updated	2023-11-11T04:39:51Z
dc.contributor.department	CIVIL AND ENVIRONMENTAL ENGINEERING
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1109/CVPRW53098.2021.00455
dc.description.sourcetitle	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.description.page	4029-4038
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
CVPRW2021_NLP_AICity.pdf	Accepted version	5.71 MB	Adobe PDF	OPEN	Post-print	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM