Spaces:

jbilcke-hf
/

VideoModelStudio

Running

App Files Files Community

VideoModelStudio / docs /finetrainers /documentation_models_ltx_video.md

jbilcke-hf HF Staff

upgrading our code to support the new finetrainers

d464085 4 months ago

preview code

raw

history blame

6.69 kB

	# LTX-Video

	## Training

	For LoRA training, specify `--training_type lora`. For full finetuning, specify `--training_type full-finetune`.

	```bash
	#!/bin/bash
	export WANDB_MODE="offline"
	export NCCL_P2P_DISABLE=1
	export TORCH_NCCL_ENABLE_MONITORING=0
	export FINETRAINERS_LOG_LEVEL=DEBUG

	GPU_IDS="0,1"

	DATA_ROOT="/path/to/dataset"
	CAPTION_COLUMN="prompts.txt"
	VIDEO_COLUMN="videos.txt"
	OUTPUT_DIR="/path/to/models/ltx-video/"

	ID_TOKEN="BW_STYLE"

	# Model arguments
	model_cmd="--model_name ltx_video \
	--pretrained_model_name_or_path Lightricks/LTX-Video"

	# Dataset arguments
	dataset_cmd="--data_root $DATA_ROOT \
	--video_column $VIDEO_COLUMN \
	--caption_column $CAPTION_COLUMN \
	--id_token $ID_TOKEN \
	--video_resolution_buckets 49x512x768 \
	--caption_dropout_p 0.05"

	# Dataloader arguments
	dataloader_cmd="--dataloader_num_workers 0"

	# Diffusion arguments
	diffusion_cmd="--flow_weighting_scheme logit_normal"

	# Training arguments
	training_cmd="--training_type lora \
	--seed 42 \
	--batch_size 1 \
	--train_steps 3000 \
	--rank 128 \
	--lora_alpha 128 \
	--target_modules to_q to_k to_v to_out.0 \
	--gradient_accumulation_steps 4 \
	--gradient_checkpointing \
	--checkpointing_steps 500 \
	--checkpointing_limit 2 \
	--enable_slicing \
	--enable_tiling"

	# Optimizer arguments
	optimizer_cmd="--optimizer adamw \
	--lr 3e-5 \
	--lr_scheduler constant_with_warmup \
	--lr_warmup_steps 100 \
	--lr_num_cycles 1 \
	--beta1 0.9 \
	--beta2 0.95 \
	--weight_decay 1e-4 \
	--epsilon 1e-8 \
	--max_grad_norm 1.0"

	# Miscellaneous arguments
	miscellaneous_cmd="--tracker_name finetrainers-ltxv \
	--output_dir $OUTPUT_DIR \
	--nccl_timeout 1800 \
	--report_to wandb"

	cmd="accelerate launch --config_file accelerate_configs/uncompiled_2.yaml --gpu_ids $GPU_IDS train.py \
	$model_cmd \
	$dataset_cmd \
	$dataloader_cmd \
	$diffusion_cmd \
	$training_cmd \
	$optimizer_cmd \
	$miscellaneous_cmd"

	echo "Running command: $cmd"
	eval $cmd
	echo -ne "-------------------- Finished executing script --------------------\n\n"
	```

	## Memory Usage

	### LoRA

	> [!NOTE]
	>
	> The below measurements are done in `torch.bfloat16` precision. Memory usage can further be reduce by passing `--layerwise_upcasting_modules transformer` to the training script. This will cast the model weights to `torch.float8_e4m3fn` or `torch.float8_e5m2`, which halves the memory requirement for model weights. Computation is performed in the dtype set by `--transformer_dtype` (which defaults to `bf16`).

	LoRA with rank 128, batch size 1, gradient checkpointing, optimizer adamw, `49x512x768` resolution, without precomputation:

	```
	Training configuration: {
	"trainable parameters": 117440512,
	"total samples": 69,
	"train epochs": 1,
	"train steps": 10,
	"batches per device": 1,
	"total batches observed per epoch": 69,
	"train batch size": 1,
	"gradient accumulation steps": 1
	}
	```

	\| stage \| memory_allocated \| max_memory_reserved \|
	\|:-----------------------:\|:----------------:\|:-------------------:\|
	\| before training start \| 13.486 \| 13.879 \|
	\| before validation start \| 14.146 \| 17.623 \|
	\| after validation end \| 14.146 \| 17.623 \|
	\| after epoch 1 \| 14.146 \| 17.623 \|
	\| after training end \| 4.461 \| 17.623 \|

	Note: requires about `18` GB of VRAM without precomputation.

	LoRA with rank 128, batch size 1, gradient checkpointing, optimizer adamw, `49x512x768` resolution, with precomputation:

	```
	Training configuration: {
	"trainable parameters": 117440512,
	"total samples": 1,
	"train epochs": 10,
	"train steps": 10,
	"batches per device": 1,
	"total batches observed per epoch": 1,
	"train batch size": 1,
	"gradient accumulation steps": 1
	}
	```

	\| stage \| memory_allocated \| max_memory_reserved \|
	\|:-----------------------------:\|:----------------:\|:-------------------:\|
	\| after precomputing conditions \| 8.88 \| 8.920 \|
	\| after precomputing latents \| 9.684 \| 11.613 \|
	\| before training start \| 3.809 \| 10.010 \|
	\| after epoch 1 \| 4.26 \| 10.916 \|
	\| before validation start \| 4.26 \| 10.916 \|
	\| after validation end \| 13.924 \| 17.262 \|
	\| after training end \| 4.26 \| 14.314 \|

	Note: requires about `17.5` GB of VRAM with precomputation. If validation is not performed, the memory usage is reduced to `11` GB.

	### Full Finetuning

	```
	Training configuration: {
	"trainable parameters": 1923385472,
	"total samples": 1,
	"train epochs": 10,
	"train steps": 10,
	"batches per device": 1,
	"total batches observed per epoch": 1,
	"train batch size": 1,
	"gradient accumulation steps": 1
	}
	```

	\| stage \| memory_allocated \| max_memory_reserved \|
	\|:-----------------------------:\|:----------------:\|:-------------------:\|
	\| after precomputing conditions \| 8.89 \| 8.937 \|
	\| after precomputing latents \| 9.701 \| 11.615 \|
	\| before training start \| 3.583 \| 4.025 \|
	\| after epoch 1 \| 10.769 \| 20.357 \|
	\| before validation start \| 10.769 \| 20.357 \|
	\| after validation end \| 10.769 \| 28.332 \|
	\| after training end \| 10.769 \| 12.904 \|

	## Inference

	Assuming your LoRA is saved and pushed to the HF Hub, and named `my-awesome-name/my-awesome-lora`, we can now use the finetuned model for inference:

	```diff
	import torch
	from diffusers import LTXPipeline
	from diffusers.utils import export_to_video

	pipe = LTXPipeline.from_pretrained(
	"Lightricks/LTX-Video", torch_dtype=torch.bfloat16
	).to("cuda")
	+ pipe.load_lora_weights("my-awesome-name/my-awesome-lora", adapter_name="ltxv-lora")
	+ pipe.set_adapters(["ltxv-lora"], [0.75])

	video = pipe("<my-awesome-prompt>").frames[0]
	export_to_video(video, "output.mp4", fps=8)
	```

	You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:

	* [LTX-Video in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)
	* [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
	* [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)