VidToMe / README.md
jadechoghari's picture
Update README.md
b87f024 verified
---
license: mit
pipeline_tag: text-to-video
library_name: diffusers
---
# VidToMe: Video Token Merging for Zero-Shot Video Editing
Edit videos instantly with just a prompt! 🎥
Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames.
This approach allows for a harmonious video generation and editing without needing to fine-tune the model.
By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods.
It follows by [this paper](https://arxiv.org/abs/2312.10656).
## Usage
```python
from diffusers import DiffusionPipeline
# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained(
"jadechoghari/VidToMe",
trust_remote_code=True,
custom_pipeline="jadechoghari/VidToMe",
sd_version="depth",
device="cuda",
float_precision="fp16"
)
# set prompts for inversion and generation
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}
# additional control and parameters
control_type = "none" # No extra control, use "depth" if needed
negative_prompt = ""
# Run the video-to-image editing pipeline
generated_images = pipeline(
video_path="path/to/video.mp4", # add path to the input video
video_prompt=inversion_prompt, # inversion prompt
edit_prompt=generation_prompt, # edit prompt for generation
control_type=control_type # control type (e.g., "none", "depth")
)
```
#### Note: For more control, consider creating a configuration and follow the instructions in the GitHub repository.
## Applications:
- Zero-shot video editing for content creators
- Video transformation using natural language prompts
- Memory-optimized video generation for longer or complex sequences
**Model Authors:**
- [Xirui Li](https://github.com/lixirui142)
- Chao Ma
- Xiaokang Yang
- Ming-Hsuan Yang
For more check the [Github Repo](https://github.com/lixirui142/VidToMe).