--- license: mit pipeline_tag: text-to-video library_name: diffusers --- # VidToMe: Video Token Merging for Zero-Shot Video Editing Edit videos instantly with just a prompt! 🎥 Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. This approach allows for a harmonious video generation and editing without needing to fine-tune the model. By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods. It follows by [this paper](https://arxiv.org/abs/2312.10656). ## Usage ```python from diffusers import DiffusionPipeline # load the pretrained model pipeline = DiffusionPipeline.from_pretrained( "jadechoghari/VidToMe", trust_remote_code=True, custom_pipeline="jadechoghari/VidToMe", sd_version="depth", device="cuda", float_precision="fp16" ) # set prompts for inversion and generation inversion_prompt = "flamingos standing in the water near a tree." generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."} # additional control and parameters control_type = "none" # No extra control, use "depth" if needed negative_prompt = "" # Run the video-to-image editing pipeline generated_images = pipeline( video_path="path/to/video.mp4", # add path to the input video video_prompt=inversion_prompt, # inversion prompt edit_prompt=generation_prompt, # edit prompt for generation control_type=control_type # control type (e.g., "none", "depth") ) ``` #### Note: For more control, consider creating a configuration and follow the instructions in the GitHub repository. ## Applications: - Zero-shot video editing for content creators - Video transformation using natural language prompts - Memory-optimized video generation for longer or complex sequences **Model Authors:** - [Xirui Li](https://github.com/lixirui142) - Chao Ma - Xiaokang Yang - Ming-Hsuan Yang For more check the [Github Repo](https://github.com/lixirui142/VidToMe).