Transformers
Safetensors
Inference Endpoints
CatVTON / README.md
nishant-verma-7's picture
corrected a spelling mistake
c38e4ab verified
|
raw
history blame
5.77 kB
metadata
license: cc-by-nc-sa-4.0

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Updates

Installation

An Installation Guide is provided to help build the conda environment for CatVTON. When deploying the app, you will need Detectron2 & DensePose, but these are not required for inference on datasets. Install the packages according to your needs.

Deployment (Gradio App)

To deploy the Gradio App for CatVTON on your own machine, just run the following command, and checkpoints will be automaticly download from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32 

When using bf16 precision, generating results with a resolution of 1024x768 only requires about 8G VRAM.

Inference

Data Preparation

Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:

β”œβ”€β”€ VITON-HD
|   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”œβ”€β”€ test
|   |   β”œβ”€β”€ image
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ cloth
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ agnostic-mask
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00_mask.png | 000008_00.png | ...]
...

For DressCode dataset, we provide our preprocessed agnostic masks, download and place in agnostic_masks folders under each category.

β”œβ”€β”€ DressCode
|   β”œβ”€β”€ test_pairs_paired.txt
|   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”œβ”€β”€ [dresses | lower_body | upper_body]
|   |   β”œβ”€β”€ test_pairs_paired.txt
|   |   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ agnostic_masks
β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.png| 013564_0.png | ...]
...

Inference on VTIONHD/DressCode

To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automaticly download from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path> 
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair  

Acknowledgement

Our code is modified based on Diffusers.
We adopt Stable Diffusion v1.5 inpainting as base model.
We use SCHP
and DensePose to automatically generate masks in our Gradio App.
Thanks to all the contributors!

Citation

@misc{chong2024catvtonconcatenationneedvirtual,
      title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
      author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
      year={2024},
      eprint={2407.15886},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.15886}, 
}