Spaces:
Running
on
Zero
Running
on
Zero
File size: 6,630 Bytes
2654152 147f082 2654152 147f082 1beac4e 00bb490 4fd56c0 1beac4e c980edd f610e83 4ae4b3e 3d6c87f 364cbc1 6820409 61bdde1 364cbc1 a18f098 91af88a c980edd 91af88a 4ae4b3e c9f7eb5 df52708 dc2c372 4ae4b3e f610e83 1beac4e 364cbc1 4fd56c0 5458c75 4fd56c0 1beac4e 364cbc1 1beac4e 364cbc1 24e20da 6820409 24e20da 3d6c87f 364cbc1 1beac4e 91af88a 96426c9 1beac4e 96426c9 1beac4e 364cbc1 abf3d6e 4ae4b3e 1beac4e 776470c 1beac4e 4fb0ca5 abf3d6e 4fb0ca5 279641e 61bdde1 96426c9 279641e 375bf59 abf3d6e 1beac4e f610e83 abf3d6e ab07c1d 4ae4b3e 364cbc1 1beac4e 4fd56c0 1beac4e b633718 1beac4e 375bf59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
---
title: catvton-flux
emoji: π₯οΈ
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
---
# catvton-flux
An state-of-the-art virtual try-on solution that combines the power of [CATVTON](https://arxiv.org/abs/2407.15886) (CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models) with Flux fill inpainting model for realistic and accurate clothing transfer.
Also inspired by [In-Context LoRA](https://arxiv.org/abs/2410.23775) for prompt engineering.
Running it now on website: [CATVTON-FLUX-TRY-ON](https://huggingface.co/spaces/xiaozaa/catvton-flux-try-on)
## Update
---
**Latest Achievement**
(2024/12/6):
- Released a new weights for tryoff. The model named [cat-tryoff-flux](https://huggingface.co/xiaozaa/cat-tryoff-flux) can extract and reconstruct the front view of clothing items from images of people wearing them. [Showcase examples](#try-off-examples) is here.
- Try-off Hugging Face: π€ [CAT-TRYOFF-FLUX](https://huggingface.co/spaces/xiaozaa/cat-try-off-flux)
(2024/12/1):
- Community comfyui support [here](https://github.com/lujiazho/ComfyUI-CatvtonFluxWrapper). Thanks to [lujiazho](https://github.com/lujiazho)
(2024/11/26):
- Updated the weights. (Still training on the VITON-HD dataset only.)
- Reduce the fine-tuning weights size (46GB -> 23GB)
- Weights has better performance on garment small details/text.
- Added the huggingface ZeroGPU support. You can run **CATVTON-FLUX-TRY-ON** now on huggingface space [here](https://huggingface.co/spaces/xiaozaa/catvton-flux-try-on)
(2024/11/25):
- Released lora weights. Lora weights achieved FID: `6.0675811767578125` on VITON-HD dataset. Test configuration: scale 30, step 30.
- Revise gradio demo. Added huggingface spaces support.
- Clean up the requirements.txt.
(2024/11/24):
- Released FID score and gradio demo
- CatVton-Flux-Alpha achieved **SOTA** performance with FID: `5.593255043029785` on VITON-HD dataset. Test configuration: scale 30, step 30. My VITON-HD test inferencing results available [here](https://drive.google.com/file/d/1T2W5R1xH_uszGVD8p6UUAtWyx43rxGmI/view?usp=sharing)
---
## Showcase
### Try-on examples
| Original | Garment | Result |
|----------|---------|---------|
| ![Original](example/person/1.jpg) | ![Garment](example/garment/00035_00.jpg) | ![Result](example/result/1.png) |
| ![Original](example/person/1.jpg) | ![Garment](example/garment/04564_00.jpg) | ![Result](example/result/2.png) |
| ![Original](example/person/00008_00.jpg) | ![Garment](example/garment/00034_00.jpg) | ![Result](example/result/3.png) |
### Try-off examples
| Original clothed model | Restored garment result |
|------------------------|------------------------|
| ![Original](example/person/00055_00.jpg) | ![Restored garment result](example/tryoff_result/restored_garment2.png) |
| ![Original](example/person/00064_00.jpg) | ![Restored garment result](example/tryoff_result/restored_garment4.png) |
| ![Original](example/person/00069_00.jpg) | ![Restored garment result](example/tryoff_result/restored_garment6.png) |
## Model Weights
### Tryon
Fine-tuning weights in Hugging Face: π€ [catvton-flux-alpha](https://huggingface.co/xiaozaa/catvton-flux-alpha)
LORA weights in Hugging Face: π€ [catvton-flux-lora-alpha](https://huggingface.co/xiaozaa/catvton-flux-lora-alpha)
### Tryoff
Fine-tuning weights in Hugging Face: π€ [cat-tryoff-flux](https://huggingface.co/xiaozaa/cat-tryoff-flux)
### Dataset
The model weights are trained on the [VITON-HD](https://github.com/shadow2496/VITON-HD) dataset.
## Prerequisites
Make sure you are running the code with VRAM >= 40GB. (I run all my experiments on a 80GB GPU, lower VRAM will cause OOM error. Will support lower VRAM in the future.)
```bash
bash
conda create -n flux python=3.10
conda activate flux
pip install -r requirements.txt
huggingface-cli login
```
## Usage
### Tryoff
Run the following command to restore the front side of the garment from the clothed model image:
```bash
python tryoff_inference.py \
--image ./example/person/00069_00.jpg \
--mask ./example/person/00069_00_mask.png \
--seed 41 \
--output_tryon test_original.png \
--output_garment restored_garment6.png \
--steps 30
```
### Tryon
Run the following command to try on an image:
LORA version:
```bash
python tryon_inference_lora.py \
--image ./example/person/00008_00.jpg \
--mask ./example/person/00008_00_mask.png \
--garment ./example/garment/00034_00.jpg \
--seed 4096 \
--output_tryon test_lora.png \
--steps 30
```
Fine-tuning version:
```bash
python tryon_inference.py \
--image ./example/person/00008_00.jpg \
--mask ./example/person/00008_00_mask.png \
--garment ./example/garment/00034_00.jpg \
--seed 42 \
--output_tryon test.png \
--steps 30
```
Run the following command to start a gradio demo with LoRA weights:
```bash
python app.py
```
Run the following command to start a gradio demo without LoRA weights:
```bash
python app_no_lora.py
```
Gradio demo:
Try-on Hugging Face: π€ [CATVTON-FLUX-TRY-ON](https://huggingface.co/spaces/xiaozaa/catvton-flux-try-on)
Try-off Hugging Face: π€ [CAT-TRYOFF-FLUX](https://huggingface.co/spaces/xiaozaa/cat-try-off-flux)
<!-- Option 2: Using a thumbnail linked to the video -->
[![Demo](example/github.jpg)](https://upcdn.io/FW25b7k/raw/uploads/github.mp4)
## TODO:
- [x] Release the FID score
- [x] Add gradio demo
- [x] Release updated weights with better performance
- [x] Train a smaller model
- [x] Support comfyui
- [x] Release tryoff weights
## Citation
```bibtex
@misc{chong2024catvtonconcatenationneedvirtual,
title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models},
author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
year={2024},
eprint={2407.15886},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.15886},
}
@article{lhhuang2024iclora,
title={In-Context LoRA for Diffusion Transformers},
author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Shi, Yupeng and Dou, Huanzhang and Liang, Chen and Feng, Yutong and Liu, Yu and Zhou, Jingren},
journal={arXiv preprint arxiv:2410.23775},
year={2024}
}
```
Thanks to [Jim](https://github.com/nom) for insisting on spatial concatenation.
Thanks to [dingkang](https://github.com/dingkwang) [MoonBlvd](https://github.com/MoonBlvd) [Stevada](https://github.com/Stevada) for the helpful discussions.
## License
- The code is licensed under the MIT License.
- The model weights have the same license as Flux.1 Fill and VITON-HD.
|