jadechoghari
/

VidToMe

Model card Files Files and versions Community

jadechoghari commited on 25 days ago

Commit

a9b005e

•

1 Parent(s): b8ac6f6

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+---
+license: mit
+---
+# VidToMe: Video Token Merging for Zero-Shot Video Editing
+Edit videos instantly with just a prompt! 🎥
+Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames.
+This approach allows for a harmonious video generation and editing without needing to fine-tune the model.
+By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods.
+It follows by [this paper](https://arxiv.org/abs/2312.10656).
+## Usage
+```python
+from diffusers import DiffusionPipeline
+# load the pretrained model
+pipeline = DiffusionPipeline.from_pretrained("jadechoghari/VidToMe", trust_remote_code=True, custom_pipeline="jadechoghari/VidToMe", sd_version="depth", device="cuda", float_precision="fp16")
+# Edit a video with prompts
+pipeline(
+    video_path="path/to/video.mp4",
+    video_prompt="A serene beach scene",
+    edit_prompt="Make the sunset more vibrant",
+    control_type="depth",
+    n_timesteps=50
+)
+```
+## Applications:
+- Zero-shot video editing for content creators
+- Video transformation using natural language prompts
+- Memory-optimized video generation for longer or complex sequences
+**Model Authors:**
+- Xirui Li
+- Chao Ma
+- Xiaokang Yang
+- Ming-Hsuan Yang
+For more check the [Github Repo](https://github.com/lixirui142/VidToMe).