---
license: mit
---
# VidToMe: Video Token Merging for Zero-Shot Video Editing

Edit videos instantly with just a prompt! 🎥

Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. 
This approach allows for a harmonious video generation and editing without needing to fine-tune the model. 
By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods.
It follows by [this paper](https://arxiv.org/abs/2312.10656).

## Usage

```python
from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/VidToMe", trust_remote_code=True, custom_pipeline="jadechoghari/VidToMe", sd_version="depth", device="cuda", float_precision="fp16")

# Edit a video with prompts
pipeline(
    video_path="path/to/video.mp4", 
    video_prompt="A serene beach scene", 
    edit_prompt="Make the sunset more vibrant", 
    control_type="depth", 
    n_timesteps=50
)
```

## Applications:
- Zero-shot video editing for content creators
- Video transformation using natural language prompts
- Memory-optimized video generation for longer or complex sequences

**Model Authors:**  
- Xirui Li  
- Chao Ma  
- Xiaokang Yang  
- Ming-Hsuan Yang

For more check the [Github Repo](https://github.com/lixirui142/VidToMe).