File size: 2,233 Bytes
a9b005e
 
221ffac
 
a9b005e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed88271
 
 
 
 
 
 
a9b005e
ed88271
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9b005e
 
ed88271
 
a9b005e
 
 
 
 
 
ed88271
a9b005e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: mit
pipeline_tag: text-to-video
library_name: diffusers
---
# VidToMe: Video Token Merging for Zero-Shot Video Editing

Edit videos instantly with just a prompt! 🎥

Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. 
This approach allows for a harmonious video generation and editing without needing to fine-tune the model. 
By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods.
It follows by [this paper](https://arxiv.org/abs/2312.10656).

## Usage

```python
from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained(
    "jadechoghari/VidToMe", 
    trust_remote_code=True, 
    custom_pipeline="jadechoghari/VidToMe", 
    sd_version="depth", 
    device="cuda", 
    float_precision="fp16"
)

# set prompts for inversion and generation
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}

# additional control and parameters
control_type = "none"  # No extra control, use "depth" if needed
negative_prompt = ""

# Run the video-to-image editing pipeline
generated_images = pipeline(
    video_path="path/to/video.mp4",            # add path to the input video
    video_prompt=inversion_prompt,    # inversion prompt
    edit_prompt=generation_prompt,    # edit prompt for generation
    control_type=control_type         # control type (e.g., "none", "depth")
)

```

#### Note: For more control, consider creating a configuration and follow the instructions in the GitHub repository.

## Applications:
- Zero-shot video editing for content creators
- Video transformation using natural language prompts
- Memory-optimized video generation for longer or complex sequences

**Model Authors:**  
- [Xirui Li](https://github.com/lixirui142)  
- Chao Ma  
- Xiaokang Yang  
- Ming-Hsuan Yang

For more check the [Github Repo](https://github.com/lixirui142/VidToMe).