Spaces:
Runtime error
Runtime error
File size: 3,703 Bytes
1bb1365 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# :rocket: Gradio Demo
This gradio demo is the simplest starting point for you play with our project.
You can either visit it at our huggingface space [here](https://huggingface.co/spaces/stabilityai/stable-virtual-camera) or run it locally yourself by
```bash
python demo_gr.py
```
We provide two ways to use our demo:
1. `Basic` mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
2. `Advanced` mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by [viser](https://viser.studio/latest)). This is suitable for power users and academic researchers.
### `Basic`
This is the default mode when entering our demo (given its simplicity).
User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
Here is a video walkthrough:
https://github.com/user-attachments/assets/4d965fa6-d8eb-452c-b773-6e09c88ca705
You can choose from 13 preset trajectories that are common for NVS (`move-forward/backward` are omitted for visualization purpose):
https://github.com/user-attachments/assets/b2cf8700-3d85-44b9-8d52-248e82f1fb55
More formally:
- `orbit/spiral/lemniscate` are good for showing the "3D-ness" of the scene.
- `zoom-in/out` keep the camera position the same while increasing/decreasing the focal length.
- `dolly zoom-in/out` move camera position backward/forward while increasing/decreasing the focal length.
- `move-forward/backward/up/down/left/right` move camera position in different directions.
Notes:
- For a 80 frame video at `786x576` resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU.
- Please expect around ~2-3x more times on HF space.
### `Advanced`
This is the power mode where you can have very fine-grained control over camera trajectories.
User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers.
Here is a video walkthrough
https://github.com/user-attachments/assets/dcec1be0-bd10-441e-879c-d1c2b63091ba
Notes:
- For a 134 frame video at `576x576` resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU.
- Please expect around ~2-3x more times on HF space.
### Pro tips
- If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else.
### Performance benchmark
We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results:
| Total time (s) | `Basic` first pass | `Basic` second pass | `Advanced` first pass | `Advanced` second pass |
|:------------------------:|:-----------------:|:------------------:|:--------------------:|:---------------------:|
| HF (L40S, w/o comp.) | 68 | 484 | 48 | 780 |
| HF (L40S, w/ comp.) | 51 | 362 | 36 | 587 |
| Local (H100, w/o comp.) | 35 | 204 | 20 | 313 |
| Local (H100, w/ comp.) | 21 | 144 | 16 | 234 |
Notes:
- HF space uses L40S GPU, and our local environment uses H100 GPU.
- We opt-in compilation by `torch.compile`.
- `Basic` mode is tested by generating 80 frames at `768x576` resolution.
- `Advanced` mode is tested by generating 134 frames at `576x576` resolution.
|