|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- genmo/mochi-1-preview |
|
pipeline_tag: text-to-video |
|
tags: |
|
- infinite zoom |
|
- art style |
|
- mochi |
|
- diffusion |
|
widget: |
|
- text: Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, zoom focuses on a can, all surface around it is made of liquid and objects swimming in it. |
|
output: |
|
url: samples/4_1800.mp4 |
|
- text: Human fingers pinching to zoom on an infinite zoom canvas, spaceship going through space. |
|
output: |
|
url: samples/5_2000.mp4 |
|
- text: Human fingers pinching to zoom on an infinite zoom canvas, orange cat in the middle of a canvas, looking upward. |
|
output: |
|
url: samples/6_2000.mp4 |
|
--- |
|
|
|
# Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi |
|
|
|
This project demonstrates the fine-tuning of the **Mochi Text-to-Video** model using a LoRA (Low-Rank Adaptation) approach, focusing on the **infinite zoom art style**. |
|
|
|
## Training Details |
|
|
|
- **Model Base**: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview) |
|
- **Fine-Tuning Dataset**: 23 short video clips of infinite zoom art style, and .txt descriptions |
|
- **Training Hardware**: H100 GPU |
|
- **Training Duration**: 2h |
|
|
|
<Gallery /> |
|
|
|
## lora.yaml: |
|
``` |
|
init_checkpoint_path: /weights/dit.safetensors |
|
checkpoint_dir: /finetunes/my_mochi_lora |
|
train_data_dir: /videos_prepared |
|
attention_mode: sdpa |
|
single_video_mode: false # Useful for debugging whether your model can learn a single video |
|
|
|
# You only need this if you're using wandb |
|
wandb: |
|
# project: mochi_1_lora |
|
# name: ${checkpoint_dir} |
|
# group: null |
|
|
|
optimizer: |
|
lr: 2e-4 |
|
weight_decay: 0.01 |
|
|
|
model: |
|
type: lora |
|
kwargs: |
|
# Apply LoRA to the QKV projection and the output projection of the attention block. |
|
qkv_proj_lora_rank: 16 |
|
qkv_proj_lora_alpha: 16 |
|
qkv_proj_lora_dropout: 0. |
|
out_proj_lora_rank: 16 |
|
out_proj_lora_alpha: 16 |
|
out_proj_lora_dropout: 0. |
|
|
|
training: |
|
model_dtype: bf16 |
|
warmup_steps: 200 |
|
num_qkv_checkpoint: 48 |
|
num_ff_checkpoint: 48 |
|
num_post_attn_checkpoint: 48 |
|
num_steps: 2000 |
|
save_interval: 200 |
|
caption_dropout: 0.1 |
|
grad_clip: 0.0 |
|
save_safetensors: true |
|
|
|
# Used for generating samples during training to monitor progress ... |
|
sample: |
|
interval: 200 |
|
output_dir: ${checkpoint_dir}/samples |
|
decoder_path: /weights/decoder.safetensors |
|
prompts: |
|
- Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further. |
|
- Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos. |
|
- Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly. |
|
- Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers. |
|
- Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda. |
|
seed: 12345 |
|
kwargs: |
|
height: 480 |
|
width: 848 |
|
num_frames: 37 |
|
num_inference_steps: 64 |
|
sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)" |
|
cfg_schedule_python_code: "[6.0] * 64" |
|
``` |