File size: 3,703 Bytes
1bb1365
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# :rocket: Gradio Demo

This gradio demo is the simplest starting point for you play with our project.

You can either visit it at our huggingface space [here](https://huggingface.co/spaces/stabilityai/stable-virtual-camera) or run it locally yourself by

```bash
python demo_gr.py
```

We provide two ways to use our demo:

1. `Basic` mode, where user can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.
2. `Advanced` mode, where user can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport (powered by [viser](https://viser.studio/latest)). This is suitable for power users and academic researchers.

### `Basic`

This is the default mode when entering our demo (given its simplicity).

User can upload a single image, and set a target camera trajectory from our preset options. This is the most straightforward way to use our model, and is suitable for most users.

Here is a video walkthrough:

https://github.com/user-attachments/assets/4d965fa6-d8eb-452c-b773-6e09c88ca705

You can choose from 13 preset trajectories that are common for NVS (`move-forward/backward` are omitted for visualization purpose):

https://github.com/user-attachments/assets/b2cf8700-3d85-44b9-8d52-248e82f1fb55

More formally:

- `orbit/spiral/lemniscate` are good for showing the "3D-ness" of the scene.
- `zoom-in/out` keep the camera position the same while increasing/decreasing the focal length.
- `dolly zoom-in/out` move camera position backward/forward while increasing/decreasing the focal length.
- `move-forward/backward/up/down/left/right` move camera position in different directions.

Notes:

- For a 80 frame video at `786x576` resolution, it takes around 20 seconds for the first pass generation, and around 2 minutes for the second pass generation, tested with a single H100 GPU.
- Please expect around ~2-3x more times on HF space.

### `Advanced`

This is the power mode where you can have very fine-grained control over camera trajectories.

User can upload one or multiple images, and set a target camera trajectory by interacting with a 3D viewport. This is suitable for power users and academic researchers.

Here is a video walkthrough

https://github.com/user-attachments/assets/dcec1be0-bd10-441e-879c-d1c2b63091ba

Notes:

- For a 134 frame video at `576x576` resolution, it takes around 16 seconds for the first pass generation, and around 4 minutes for the second pass generation, tested with a single H100 GPU.
- Please expect around ~2-3x more times on HF space.

### Pro tips

- If the first pass sampling result is bad, click "Abort rendering" button in GUI to avoid stucking at second pass sampling such that you can try something else.

### Performance benchmark

We have tested our gradio demo in both a local environment and the HF space environment, across different modes and compilation settings. Here are our results:
| Total time (s) | `Basic` first pass | `Basic` second pass | `Advanced` first pass | `Advanced` second pass |
|:------------------------:|:-----------------:|:------------------:|:--------------------:|:---------------------:|
| HF (L40S, w/o comp.) | 68 | 484 | 48 | 780 |
| HF (L40S, w/ comp.) | 51 | 362 | 36 | 587 |
| Local (H100, w/o comp.) | 35 | 204 | 20 | 313 |
| Local (H100, w/ comp.) | 21 | 144 | 16 | 234 |

Notes:

- HF space uses L40S GPU, and our local environment uses H100 GPU.
- We opt-in compilation by `torch.compile`.
- `Basic` mode is tested by generating 80 frames at `768x576` resolution.
- `Advanced` mode is tested by generating 134 frames at `576x576` resolution.