Spaces:
Runtime error
Runtime error
# :rocket: Gradio Demo | |
This gradio demo is the simplest starting point for you play with our project. | |
You can either visit it at our huggingface space [here](https://huggingface.co/spaces/stabilityai/stable-virtual-camera) or run it locally yourself by | |
```bash | |
python demo_gr.py | |
``` | |
We provide two ways to use our demo: | |
1. `Basic` mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users. | |
2. `Advanced` mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by [viser](https://viser.studio/latest)). This is suitable for power users and academic researchers. | |
### `Basic` | |
This is the default mode when entering our demo (given its simplicity). | |
User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users. | |
Here is a video walkthrough: | |
https://github.com/user-attachments/assets/4d965fa6-d8eb-452c-b773-6e09c88ca705 | |
You can choose from 13 preset trajectories that are common for NVS (`move-forward/backward` are omitted for visualization purpose): | |
https://github.com/user-attachments/assets/b2cf8700-3d85-44b9-8d52-248e82f1fb55 | |
More formally: | |
- `orbit/spiral/lemniscate` are good for showing the "3D-ness" of the scene. | |
- `zoom-in/out` keep the camera position the same while increasing/decreasing the focal length. | |
- `dolly zoom-in/out` move camera position backward/forward while increasing/decreasing the focal length. | |
- `move-forward/backward/up/down/left/right` move camera position in different directions. | |
Notes: | |
- For a 80 frame video at `786x576` resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU. | |
- Please expect around ~2-3x more times on HF space. | |
### `Advanced` | |
This is the power mode where you can have very fine-grained control over camera trajectories. | |
User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers. | |
Here is a video walkthrough | |
https://github.com/user-attachments/assets/dcec1be0-bd10-441e-879c-d1c2b63091ba | |
Notes: | |
- For a 134 frame video at `576x576` resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU. | |
- Please expect around ~2-3x more times on HF space. | |
### Pro tips | |
- If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else. | |
### Performance benchmark | |
We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results: | |
| Total time (s) | `Basic` first pass | `Basic` second pass | `Advanced` first pass | `Advanced` second pass | | |
|:------------------------:|:-----------------:|:------------------:|:--------------------:|:---------------------:| | |
| HF (L40S, w/o comp.) | 68 | 484 | 48 | 780 | | |
| HF (L40S, w/ comp.) | 51 | 362 | 36 | 587 | | |
| Local (H100, w/o comp.) | 35 | 204 | 20 | 313 | | |
| Local (H100, w/ comp.) | 21 | 144 | 16 | 234 | | |
Notes: | |
- HF space uses L40S GPU, and our local environment uses H100 GPU. | |
- We opt-in compilation by `torch.compile`. | |
- `Basic` mode is tested by generating 80 frames at `768x576` resolution. | |
- `Advanced` mode is tested by generating 134 frames at `576x576` resolution. | |