InfiniteZoom-Mochi / README.md
martintomov's picture
last example
24d9e55 verified
---
license: apache-2.0
base_model:
- genmo/mochi-1-preview
pipeline_tag: text-to-video
tags:
- infinite zoom
- art style
- mochi
- diffusion
widget:
- text: Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, zoom focuses on a can, all surface around it is made of liquid and objects swimming in it.
output:
url: samples/4_1800.mp4
- text: Human fingers pinching to zoom on an infinite zoom canvas, spaceship going through space.
output:
url: samples/5_2000.mp4
- text: Human fingers pinching to zoom on an infinite zoom canvas, orange cat in the middle of a canvas, looking upward.
output:
url: samples/6_2000.mp4
---
# Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi
This project demonstrates the fine-tuning of the **Mochi Text-to-Video** model using a LoRA (Low-Rank Adaptation) approach, focusing on the **infinite zoom art style**.
## Training Details
- **Model Base**: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview)
- **Fine-Tuning Dataset**: 23 short video clips of infinite zoom art style, and .txt descriptions
- **Training Hardware**: H100 GPU
- **Training Duration**: 2h
<Gallery />
## lora.yaml:
```
init_checkpoint_path: /weights/dit.safetensors
checkpoint_dir: /finetunes/my_mochi_lora
train_data_dir: /videos_prepared
attention_mode: sdpa
single_video_mode: false # Useful for debugging whether your model can learn a single video
# You only need this if you're using wandb
wandb:
# project: mochi_1_lora
# name: ${checkpoint_dir}
# group: null
optimizer:
lr: 2e-4
weight_decay: 0.01
model:
type: lora
kwargs:
# Apply LoRA to the QKV projection and the output projection of the attention block.
qkv_proj_lora_rank: 16
qkv_proj_lora_alpha: 16
qkv_proj_lora_dropout: 0.
out_proj_lora_rank: 16
out_proj_lora_alpha: 16
out_proj_lora_dropout: 0.
training:
model_dtype: bf16
warmup_steps: 200
num_qkv_checkpoint: 48
num_ff_checkpoint: 48
num_post_attn_checkpoint: 48
num_steps: 2000
save_interval: 200
caption_dropout: 0.1
grad_clip: 0.0
save_safetensors: true
# Used for generating samples during training to monitor progress ...
sample:
interval: 200
output_dir: ${checkpoint_dir}/samples
decoder_path: /weights/decoder.safetensors
prompts:
- Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further.
- Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos.
- Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly.
- Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers.
- Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda.
seed: 12345
kwargs:
height: 480
width: 848
num_frames: 37
num_inference_steps: 64
sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)"
cfg_schedule_python_code: "[6.0] * 64"
```