jadechoghari
/

VidToMe

Model card Files Files and versions Community

VidToMe / README.md

jadechoghari's picture

Update README.md

b87f024 verified 25 days ago

|

history blame contribute delete

2.23 kB

	---
	license: mit
	pipeline_tag: text-to-video
	library_name: diffusers
	---
	# VidToMe: Video Token Merging for Zero-Shot Video Editing

	Edit videos instantly with just a prompt! 🎥

	Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames.
	This approach allows for a harmonious video generation and editing without needing to fine-tune the model.
	By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods.
	It follows by [this paper](https://arxiv.org/abs/2312.10656).

	## Usage

	```python
	from diffusers import DiffusionPipeline

	# load the pretrained model
	pipeline = DiffusionPipeline.from_pretrained(
	"jadechoghari/VidToMe",
	trust_remote_code=True,
	custom_pipeline="jadechoghari/VidToMe",
	sd_version="depth",
	device="cuda",
	float_precision="fp16"
	)

	# set prompts for inversion and generation
	inversion_prompt = "flamingos standing in the water near a tree."
	generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}

	# additional control and parameters
	control_type = "none" # No extra control, use "depth" if needed
	negative_prompt = ""

	# Run the video-to-image editing pipeline
	generated_images = pipeline(
	video_path="path/to/video.mp4", # add path to the input video
	video_prompt=inversion_prompt, # inversion prompt
	edit_prompt=generation_prompt, # edit prompt for generation
	control_type=control_type # control type (e.g., "none", "depth")
	)

	```

	#### Note: For more control, consider creating a configuration and follow the instructions in the GitHub repository.

	## Applications:
	- Zero-shot video editing for content creators
	- Video transformation using natural language prompts
	- Memory-optimized video generation for longer or complex sequences

	Model Authors:
	- [Xirui Li](https://github.com/lixirui142)
	- Chao Ma
	- Xiaokang Yang
	- Ming-Hsuan Yang

	For more check the [Github Repo](https://github.com/lixirui142/VidToMe).