YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ViViD

ViViD: Video Virtual Try-on using Diffusion Models

arXiv Project Page Hugging Face Spaces

Dataset

Dataset released: ViViD

Installation

git clone https://github.com/alibaba-yuanjing-aigclab/ViViD
cd ViViD

Environment

conda create -n vivid python=3.10
conda activate vivid
conda activate /mnt/pfs-mc0p4k/ssai/cvg/team/envs/vivid
pip install -r requirements.txt  

Weights

You can place the weights anywhere you like, for example, ./ckpts. If you put them somewhere else, you just need to update the path in ./configs/prompts/*.yaml.

Stable Diffusion Image Variations

cd ckpts

git lfs install
git clone https://huggingface.co/lambdalabs/sd-image-variations-diffusers

SD-VAE-ft-mse

git lfs install
git clone https://huggingface.co/stabilityai/sd-vae-ft-mse

Motion Module

Download mm_sd_v15_v2

ViViD

git lfs install
git clone https://huggingface.co/alibaba-yuanjing-aigclab/ViViD

Inference

We provide two demos in ./configs/prompts/, run the following commands to have a try😼.

python vivid.py --config ./configs/prompts/upper1.yaml

python vivid.py --config ./configs/prompts/lower1.yaml

Data

As illustrated in ./data, the following data should be provided.

./data/
|-- agnostic
|   |-- video1.mp4
|   |-- video2.mp4
|   ...
|-- agnostic_mask
|   |-- video1.mp4
|   |-- video2.mp4
|   ...
|-- cloth
|   |-- cloth1.jpg
|   |-- cloth2.jpg
|   ...
|-- cloth_mask
|   |-- cloth1.jpg
|   |-- cloth2.jpg
|   ...
|-- densepose
|   |-- video1.mp4
|   |-- video2.mp4
|   ...
|-- videos
|   |-- video1.mp4
|   |-- video2.mp4
|   ...

Agnostic and agnostic_mask video

This part is a bit complex, you can obtain them through any of the following three ways:

  1. Follow OOTDiffusion to extract them frame-by-frame.(recommended)
  2. Use SAM + Gaussian Blur.(see ./tools/sam_agnostic.py for an example)
  3. Mask editor tools.

Note that the shape and size of the agnostic area may affect the try-on results.

Densepose video

See vid2densepose.(Thanks)

Cloth mask

Any detection tool is ok for obtaining the mask, like SAM.

BibTeX

@misc{fang2024vivid,
        title={ViViD: Video Virtual Try-on using Diffusion Models}, 
        author={Zixun Fang and Wei Zhai and Aimin Su and Hongliang Song and Kai Zhu and Mao Wang and Yu Chen and Zhiheng Liu and Yang Cao and Zheng-Jun Zha},
        year={2024},
        eprint={2405.11794},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

Contact Us

Zixun Fang: [email protected]
Yu Chen: [email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.