license: cc-by-nc-sa-4.0
π CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
Updates
2024/7/24
: Our Paper on ArXiv is available now π₯³!2024/7/22
: Our App Code is released, deploy and enjoy CatVTON on your own mechine π!2024/7/21
: Our Inference Code and Weights π€ are released.2024/7/11
: Our Online Demo is released π.
Installation
An Installation Guide is provided to help build the conda environment for CatVTON. When deploying the app, you will need Detectron2 & DensePose, but these are not required for inference on datasets. Install the packages according to your needs.
Deployment (Gradio App)
To deploy the Gradio App for CatVTON on your own machine, just run the following command, and checkpoints will be automaticly download from HuggingFace.
CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32
When using bf16
precision, generating results with a resolution of 1024x768
only requires about 8G
VRAM.
Inference
Data Preparation
Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:
βββ VITON-HD
| βββ test_pairs_unpaired.txt
β βββ test
| | βββ image
β β β βββ [000006_00.jpg | 000008_00.jpg | ...]
β β βββ cloth
β β β βββ [000006_00.jpg | 000008_00.jpg | ...]
β β βββ agnostic-mask
β β β βββ [000006_00_mask.png | 000008_00.png | ...]
...
For DressCode dataset, we provide our preprocessed agnostic masks, download and place in agnostic_masks
folders under each category.
βββ DressCode
| βββ test_pairs_paired.txt
| βββ test_pairs_unpaired.txt
β βββ [dresses | lower_body | upper_body]
| | βββ test_pairs_paired.txt
| | βββ test_pairs_unpaired.txt
β β βββ images
β β β βββ [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
β β βββ agnostic_masks
β β β βββ [013563_0.png| 013564_0.png | ...]
...
Inference on VTIONHD/DressCode
To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automaticly download from HuggingFace.
CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path>
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair
Acknowledgement
Our code is modified based on Diffusers.
We adopt Stable Diffusion v1.5 inpainting as base model.
We use SCHP
and DensePose to automatically generate masks in our
Gradio App.
Thanks to all the contributors!
Citation
@misc{chong2024catvtonconcatenationneedvirtual,
title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models},
author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
year={2024},
eprint={2407.15886},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.15886},
}