VidToMe / README.md
jadechoghari's picture
Update README.md
a9b005e verified
|
raw
history blame
1.52 kB
metadata
license: mit

VidToMe: Video Token Merging for Zero-Shot Video Editing

Edit videos instantly with just a prompt! 🎥

Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. This approach allows for a harmonious video generation and editing without needing to fine-tune the model. By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods. It follows by this paper.

Usage

from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/VidToMe", trust_remote_code=True, custom_pipeline="jadechoghari/VidToMe", sd_version="depth", device="cuda", float_precision="fp16")

# Edit a video with prompts
pipeline(
    video_path="path/to/video.mp4", 
    video_prompt="A serene beach scene", 
    edit_prompt="Make the sunset more vibrant", 
    control_type="depth", 
    n_timesteps=50
)

Applications:

  • Zero-shot video editing for content creators
  • Video transformation using natural language prompts
  • Memory-optimized video generation for longer or complex sequences

Model Authors:

  • Xirui Li
  • Chao Ma
  • Xiaokang Yang
  • Ming-Hsuan Yang

For more check the Github Repo.