SPG-VTON: Semantic Prediction Guidance for Multi-Pose Virtual Try-on | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1109/TMM.2022.3143712

Title:	SPG-VTON: Semantic Prediction Guidance for Multi-Pose Virtual Try-on
Authors:	Hu, Bingwen Liu, Ping Zheng, Zhedong Ren, Mingwu
Keywords:	Science & Technology Technology Computer Science, Information Systems Computer Science, Software Engineering Telecommunications Computer Science Semantics Clothing Faces Fitting Training Shape Image synthesis End-to-end multi-pose semantic prediction virtual try-on
Issue Date:	2022
Publisher:	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation:	Hu, Bingwen, Liu, Ping, Zheng, Zhedong, Ren, Mingwu (2022). SPG-VTON: Semantic Prediction Guidance for Multi-Pose Virtual Try-on. IEEE TRANSACTIONS ON MULTIMEDIA 24 : 1233-1246. ScholarBank@NUS Repository. https://doi.org/10.1109/TMM.2022.3143712
Abstract:	Image-based virtual try-on is challenging in fitting a target in-shop clothes onto a reference person under diverse human poses. Previous works focus on preserving clothing details (e.g., texture, logos, patterns) when transferring desired clothes onto a target person under a fixed pose. However, the performances of existing methods significantly dropped when extending existing methods to multi-pose virtual try-on. In this paper, we propose an end-to-end Semantic Prediction Guidance multi-pose Virtual Try-On Network (SPG-VTON), which can fit the desired clothing into a reference person under arbitrary poses. Specifically, SPG-VTON is composed of three sub-modules. First, a Semantic Prediction Module (SPM) generates the desired semantic map. The predicted semantic map provides more abundant guidance to locate the desired clothing region and produce a coarse try-on image. Second, a Clothes Warping Module (CWM) warps in-shop clothes to the desired shape according to the predicted semantic map and the desired pose. Specifically, we introduce a conductible cycle consistency loss to alleviate the misalignment in the clothing warping process. Third, a Try-on Synthesis Module (TSM) combines the coarse result and the warped clothes to generate the final virtual try-on image, preserving details of the desired clothes and under the desired pose. In addition, we introduce a face identity loss to refine the facial appearance and maintain the identity of the final virtual try-on result at the same time. We evaluate the proposed method on the most massive multi-pose dataset (MPV) and the DeepFashion dataset. The qualitative and quantitative experiments show that SPG-VTON is superior to the state-of-the-art methods and is robust to data noise, including background and accessory changes, i.e., hats and handbags, showing good scalability to the real-world scenario.
Source Title:	IEEE TRANSACTIONS ON MULTIMEDIA
URI:	https://scholarbank.nus.edu.sg/handle/10635/245847
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2022.3143712
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
Hu_CYB20.pdf	Accepted version	2.67 MB	Adobe PDF	OPEN	Post-print	View/Download

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.