File size: 3,266 Bytes
cf4c88a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c35f510
f869c4e
c35f510
f869c4e
c35f510
f869c4e
c35f510
f869c4e
c35f510
f869c4e
c35f510
f869c4e
c35f510
f869c4e
 
c35f510
 
 
 
 
 
 
 
 
 
f869c4e
 
c35f510
 
f869c4e
c35f510
 
 
 
f869c4e
 
c35f510
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: other
license_name: sv3d-nc-community
license_link: LICENSE
datasets:
- allenai/objaverse
pipeline_tag: image-to-video
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/sv3d/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options: 
      - Yes
      - No
---
# [SV3D-diffusers](https://github.com/chenguolin/sv3d-diffusers)

![](assets/sv3doutputs.gif)

This repo (https://github.com/chenguolin/sv3d-diffusers) provides scripts about:

1. Spatio-temporal UNet (`SV3DUNetSpatioTemporalConditionModel`) and pipeline (`StableVideo3DDiffusionPipeline`) modified from [SVD](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py) for [SV3D](https://sv3d.github.io) in the [diffusers](https://github.com/huggingface/diffusers) convention.

2. Converting the [Stability-AI](https://github.com/Stability-AI/generative-models)'s [SV3D-p UNet checkpoint](https://huggingface.co/stabilityai/sv3d) to the [diffusers](https://github.com/huggingface/diffusers) convention.

3. Infering the `SV3D-p` model with the [diffusers](https://github.com/huggingface/diffusers) library to synthesize a 21-frame orbital video around a 3D object from a single-view image (preprocessed by removing background and centering first).

Converted SV3D-p checkpoints have been uploaded to HuggingFace🤗 [chenguolin/sv3d-diffusers](https://huggingface.co/chenguolin/sv3d-diffusers).


## 🚀 Usage
```bash
git clone https://github.com/chenguolin/sv3d-diffusers.git
# Please install PyTorch first according to your CUDA version
pip3 install -r requirements.txt
# If you can't access to HuggingFace🤗, try:
# export HF_ENDPOINT=https://hf-mirror.com
python3 infer.py --output_dir out/ --image_path assets/images/sculpture.png --elevation 10 --half_precision --seed -1
```
The synthesized video will save at `out/` as a `.gif` file.


## 📸 Results
> Image preprocessing and random seed for different implementations are different, so the results are presented only for reference.

| Implementation | sculpture |  bag   | kunkun |
| :------------- | :------:  | :----: | :----: |
| **SV3D-diffusers (Ours)** | ![](assets/sculpture.gif) | ![](assets/bag.gif) | ![](assets/kunkun.gif) |
| **Official SV3D**  | ![](assets/sculpture_official.gif) | ![](assets/bag_official.gif) | ![](assets/kunkun_official.gif) |


## 📚 Citation
If you find this repo helpful, please consider giving this repository a star 🌟 and citing the original SV3D paper.
```
@inproceedings{voleti2024sv3d,
   author={Voleti, Vikram and Yao, Chun-Han and Boss, Mark and Letts, Adam and Pankratz, David and Tochilkin,  Dmitrii and Laforte, Christian and Rombach, Robin and Jampani, Varun},
   title={{SV3D}: Novel Multi-view Synthesis and {3D} Generation from a Single Image using Latent Video Diffusion},
   booktitle={European Conference on Computer Vision (ECCV)},
   year={2024},
}
```