last example

24d9e55 verified 2 months ago

4.42 kB

	---
	license: apache-2.0
	base_model:
	- genmo/mochi-1-preview
	pipeline_tag: text-to-video
	tags:
	- infinite zoom
	- art style
	- mochi
	- diffusion
	widget:
	- text: Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, zoom focuses on a can, all surface around it is made of liquid and objects swimming in it.
	output:
	url: samples/4_1800.mp4
	- text: Human fingers pinching to zoom on an infinite zoom canvas, spaceship going through space.
	output:
	url: samples/5_2000.mp4
	- text: Human fingers pinching to zoom on an infinite zoom canvas, orange cat in the middle of a canvas, looking upward.
	output:
	url: samples/6_2000.mp4
	---

	# Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi

	This project demonstrates the fine-tuning of the Mochi Text-to-Video model using a LoRA (Low-Rank Adaptation) approach, focusing on the infinite zoom art style.

	## Training Details

	- Model Base: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview)
	- Fine-Tuning Dataset: 23 short video clips of infinite zoom art style, and .txt descriptions
	- Training Hardware: H100 GPU
	- Training Duration: 2h

	<Gallery />

	## lora.yaml:
	```
	init_checkpoint_path: /weights/dit.safetensors
	checkpoint_dir: /finetunes/my_mochi_lora
	train_data_dir: /videos_prepared
	attention_mode: sdpa
	single_video_mode: false # Useful for debugging whether your model can learn a single video

	# You only need this if you're using wandb
	wandb:
	# project: mochi_1_lora
	# name: ${checkpoint_dir}
	# group: null

	optimizer:
	lr: 2e-4
	weight_decay: 0.01

	model:
	type: lora
	kwargs:
	# Apply LoRA to the QKV projection and the output projection of the attention block.
	qkv_proj_lora_rank: 16
	qkv_proj_lora_alpha: 16
	qkv_proj_lora_dropout: 0.
	out_proj_lora_rank: 16
	out_proj_lora_alpha: 16
	out_proj_lora_dropout: 0.

	training:
	model_dtype: bf16
	warmup_steps: 200
	num_qkv_checkpoint: 48
	num_ff_checkpoint: 48
	num_post_attn_checkpoint: 48
	num_steps: 2000
	save_interval: 200
	caption_dropout: 0.1
	grad_clip: 0.0
	save_safetensors: true

	# Used for generating samples during training to monitor progress ...
	sample:
	interval: 200
	output_dir: ${checkpoint_dir}/samples
	decoder_path: /weights/decoder.safetensors
	prompts:
	- Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further.
	- Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos.
	- Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly.
	- Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers.
	- Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda.
	seed: 12345
	kwargs:
	height: 480
	width: 848
	num_frames: 37
	num_inference_steps: 64
	sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)"
	cfg_schedule_python_code: "[6.0] * 64"
	```